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The  First  Conference  on  the  Design  of  Experiments  in  Army  Research, 
Development  and  Testing  was  held  on  October  19-21 ,  1955  at  the  Diamond 
Ordnanoe  Fuze  Laboratories  and  the  National  Bureau  of  Standards  and  its 
Proceedings  have  been  published.  On  the  basis  of  the  suooess  of  this 
Conference  the  Army  Mathematics  Steering  Committee  of  the  Researoh  and 
Development  Offioe  of  the  Department  of  the  Army  decided  that  a  {similar 
Conference  should  be  organized  and  held  during  the  fall  of  1956. 

Accordingly,  the  Seoond  Conference  was  held  on  October  17-19,  1956 
at  the  Diamond  Fuze  Laboratories  and  the  National  Bureau  of  Standards. 

The  organization  of  the  Seoond  Conference  was  similar  to  that  of  the  First 
Conference.  There  were  three  categories  of  sessions.  The  first  oategory 
consisted  of  invited  papers  by  well-known  authorities  in  the  design  of 
experiments.  The  seoond  consisted  of  technical  papers  oontributed  by 
researoh  workers  from  the  cions  Army  researoh,  development  and  testing 
facilities.  The  third  oategory  was  composed  of  ollnioal  sessions  devoted 
to  presentation  and  discussion  of  partially  solved  or  unsolved  problems 
which  had  arisen  in  these  facilities.  The  program  of  the  three-day  con¬ 
ference  appears  on  the  next  few  pages  of  these  Proceedings. 

The  Seoond  Conference  was  attended  by  l8l  registrants  and  participants 
from  67  organizations.  Speakers  Snd  other  participants  came  from  the 
Bell  Telephone  Laboratories,  Central  Electric  Company,  National  Bureau  of 
Standards,  National  Institute  of  Health,  Prinoeton  University,  University 
of  North  Carolina,  Virginia  Polytechnic  Institute,  and  17  Army  facilities. 

The  present  volume  of  Proceedings  oontains  26  papers  and  an  appendix 
which  oontains  3  classified  papers,  all  of  whioh  were  presented  at  the 
Conference,  The  papers  are  being  made  available  in  this  form  as  a  con¬ 
tribution  to  wider  dissemination  and  use  of  modern  statistical  principles 
of  the  design  of  experiments  in  researoh,  development,  and  testing  work  of 
concern  to  the  Army. 

The  members  of  the  Army  Mathematics  Steering  Committee  take  this  oppor¬ 
tunity  to  express  their  thanks  to  those  research  workers  in  the  various 
Army  research,  development,  and  testing  facilities  who  j  irticipated  in  the 
Conference;  to  Lt,  Colonel  J,  A,  Ulrich,  the  Commanding  Officer  of  the 
Diamond  Ordnance  Fuze  Laboratoriea  and  to  Dr.  A.  V.  Astin,  the  Direotor  of 
the  National  Bureau  of  Standards,  for  making  available  the  excellent 
facilities  of  their  two  organizations  for  the  Conference;  to  Mr.  John  A. 
Wheeler  who  handled  the  details  of  the  local  arrangements  for  the  Conference 
at  both  installations;  and  to  Dr.  F,  GL  Dreseel  of  the  Office  of  Ordnance 
Research  who  carried  through  the  details,  including  all  correspondence 
involved  in  organizing  the  Conference  and  in  preparing  these  Proceedings,, 
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AN  EXAMPLE  OF  DESIGN  OF  EXPERIMENTS  AT 
THE  NATIONAL  BUREAU  OF  STANDARDS 

R.  D.  Huntoon 

The  National  Bureau  of  Standards 

I  wish  to  extend  a  dual  welcome  to  the  members  of  the  Second  Joint 
Conference  on  the  Design  of  Experiments  in  Army  Research,  Development  and 
Testing.  You  are  hereby  welcomed  to  the  laboratories  of  the  National  Bureau 
of  Standards  and  to  the  Diamond  Ordnance  Fuse  Laboratories.  We  both  wish 
you  every  success  in  this  your  second  conference. 

Tb  some  of  you,  it  may  seem  a  little  confusing  that  you  came  to  EOFL 
for  the  conference  and  find  your  meeting  starting  off  in  NBS.  It  may  help 
if  I  explain  that  DOFL  was,  until  1953,  a  part  of  NBS.  At  that  time,  the 
ordnance  activities  of  NBS  were  transferred  to  the  Department  of  Defense 
and  DOFL  was  established  as  a  faoillty  of  the  Office  of  the  Chief  of  Ordnance, 
Department  of  the  Army.  This  was  in  some  respects  merely  a  change  of  title, 
ainoe  essentially  the  same  people  are  doing  the  same  work  in  the  same  labora¬ 
tories,  and  we  still  work  closely  and  harmoniously  together  as  we  did  earlier. 

The  reason  for  this  separation  is  interesting  and  worth  discussing 
briefly,  for  it  gives  an  insight  into  the  aims  and  missions  of  the  two  insti¬ 
tutions.  The  statutory  functions  of  NBS,  as  authorized  by  the  Congress,  are 
six  in  number.  They  fall  into  two  groups  whioh  I  like  to  call  direct  and 
indirect.  Stated  briefly,  the  direct  functions  arei 

1.  Development  and  custody  of  the  national  standards  and  their 
dissemination  via  calibrations. 

2.  Determination  of  physical  constants  and  critical  properties  of 
materials. 

3.  Development  of  methods  of  testing  materials,  mechanisms  and  structures. 

An  institution  which  is  properly  staffed  and  equipped  to  fulfill  these 
functions  in  all  the  fields  of  the  physical  scienoes  is  in  a  unique  position 
in  the  Government  to  perform  additional  functions  which  derive  from  these 
three.  The  derived  functions  arei 

it.  Cooperation  with  other  government  agencies  and  private  organiza¬ 
tions  in  the  development  of  codes  and  specifications. 

5.  Scientific  and  technical  advice  and  consultation  service  to  other 
government  agenciee. 

6.  Invention  and  development  of  devices  to  serve  the  special  needs  of 
the  government. 

Before  proceeding  with  the  discussion,  it  is  appropriate  to  pause  here 
and  emphasize  the  fact,  which  should  bs  clear  from  the  statement  of  the 
functions,  that  NBS  is  not  a  consumer  testing  organization  as  is  sometimes 
mistakenly  believed,  it  f s  an  institution  devoted  to  the  science  of 
meaeurement  ae  a  service  to  the  country’s  scientists  and  engineers. 
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The  advent  of  the  last  war  naturally  brought  great  emphasis  on  the 
third  number  of  the  trilogy  of  derived  functions,  i,e.,  the  invention  and 
development  of  devices  to  serve  the  special  needs  of  the  government .  During 
the  war  and  the  years  immediately  following,  there  grew  up  within  NBS  an 
institution  within  an  institution  whose  mission  was  to  perform  research  and 
development  leading  to  end  item  hardware  for  military  use.  In  fact,  this 
Institution  already  known  as  the  Diamond  Ordnance  Laboratories  had  grown  to 
the  point  where  its  program  was  larger  than  that  of  the  rest  of  NBS.  A 
careful  study  of  the  situation  in  1953  led  to  the  recommendation  that  the 
’•Diamond  Laboratories"  should  become  a  separate  institution,  and  the  recom¬ 
mendation  was  implemented.  We  now  work  together  compatibly,  each  toward 
its  own  objectives  with  mutual  assistance  and  sharing  of  facilities. 

The  importance  of  design  of  experiment  is  well  recognized  in  both 
institutions  and  in  faot  we  consult  and  collaborate  from  time  to  time  in 
the  design  of  experiment  in  the  full  technical  sense  of  the  term.  We  are, 
therefore,  pleased  to  have  this  conference  assemble  here  for  we  feel  that 
our  staffs  will  benefit. from  the  stimulating  new  information  and  points  of 
view  which  should  emerge  from  these  meetings. 

And  now  it  is  interesting  to  turn  for  a  few  moments  from  the  general 
to  the  specific  and  take  a  brief  look  at  an  example  of  design  of  experiment 
in  progress  in  the  physical  constants  work  at  NBS, 

We,  along  with  the  other  national  standardizing  laboratories  of  the 
world,  are  engaged  in  devising  new  experiments  for  a  precise  determination 
of  the  acceleration  of  gravity,  g.  Strictly  speaking,,  g  is  not  a  physical 
constant,  although  it  is  commonly  referred  to  as  one.  It  varies  from  plaoe 
to  plaoe  over  the  surfaoe  of  the  earth  and  very  slightly  from  time  to  time 
at  any  one  plaoe.  However,  it  is  essentially  a  constant  at  any  one  place 
and  the  changes  between  locations  can  be  very  precisely  determined.  The 
problem  is  to  measure  its  absolute  magnitude  at  some  one  selected  place. 

Our  interest  in  the  problem  arises  this  way.  In  order  to  have  a 
consistent  set  of  units  and  standards  in  the  various  fields  of  scienoe,  eaoh 
must  be  appropriately  related  to  the  arbitrary  prototype  standards  of  mass, 
length,  time  and  temperature  through  an  unbroken  chain  of  measurement.  The 
determination  of  g  provides  the  transfer  from  these  to  force  measurements 
and  thence,  for  example,  to  the  electrical  standards  and  via  them  to  our 
knowledge  of  the  fundamental  atomic  constants,  e,  h,  m,  etc. 

The  unit  of  force  follows  from  Newton's  law 

f  ~m  a 

as  that  force  which  will  impart  unit  acceleration  to  unit  mass.  Now  the 
attraction  of  the  earth  provides  a  convenient  reproducible  force  acting 
upon  every  mass.  Unfortunately,  this  force  at  the  surface  of  the  earth, 
where  we  are  interested  in  it,  is  not  unity  on  unit  mass.  If  a  mass  is 
allowed  to  fall  (accelerate),  it  does  not  accelerate  with  unit  acceleration 
but  with  an  acceleration  g.  However,  if  we  measure  carefully  the  accel¬ 
eration  g,  we  can  then  measure  a  force  by  means  of  a  balance.  We  let  the 
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force  pull  one  arm  of  the  balance  and  hang  weights  from  the  other  arm  until 
true  balance  is  indicated.  If  m  is  the  mass  of  the  weights  added,  then  the 
unknown  force  is  given  by 


f  -  m  g. 

We  thus  see  that  g  is  a  transfer  constant  enabling  us  to  make  force  measure¬ 
ments  in  terms  of  our  standards  of  mass,  length  and  time,  for  the  measurement 
of  g  is  essentially  a  precise  determination  of  how  long  (time)  it  takes  a 
body  to  fall  a  given  distance  (length). 

You  may  be  thinking  that  it  should  be  possible  to  arrange  a  force  which 
would  give  unit  acceleration  to  unit  mass  and  use  it  for  our  standard.  This 
could,  of  course,  be  done  but  no  one  has  devised  a  system  which  will  do  it 
as  precisely  and  reporducibly  as  the  scheme  which  uses  the  attraction  of 
the  earth. 

Now,  our  electrical  standards  are  based  upon  the  ampere  and  the  ohm. 

To  determine  the  ampere,  in  absolute  units,  we  measure  the  force  between 
two  conductors  carrying  a  current.  Thus,  g  gets  into  the  ampere.  The 
ohm  does  not  involve  it,  so  we  drop  it  from  consideration  here.  Our 
measurements  of  many  constants  and  in  particular  the  atomic  constants  are 
done  by  means  of  electric  and  magnetic  fields  and  hence  involve  the  ampere, 
also  unavoidably  g. 

It  is  indeed  surprising  to  find  that  our  presently  accepted  value  of 
this  important  transfer  constant  g  depends  upon  three  "independent” 
measurements  all  using  the  method  of  the  Kater  reversible  pendulum. 

The  results  of  these  determinations  are  referred,  by  means  of  very 
precise  transfer  measurements,  to  one  specific  location  Potsdam,  Germany. 

They  are  shown  in  the  table 


Potsdam 

1906 

Kuhnen  &  Furtwangler 

980.100 

Dryden  Revision 

191*2 

Dryden  (NBS) 

980.088 

Washington  (NBS) 

1936 

Heyl  &  Cook 

980.080 

Tecidington,  England  (NPL)  1939 

Clark 

980.081* 

Mean 

of  last  three 

980.081* 

P.E.  of  mean  2  in  10^ 


This  looks  like  very  good  agreement  but  attention  should  be  called  to 
the  19h2  revision  of  the  1906  measurement.  This  shows  that  a  later  look 
at  the  same  data  brings  a  change  of  about  12  parts  per  million.  Also,  all 
the  measurements  are  subject  to  the  same  possible  systematic  errors  and  so 
the  measurements  are  not  truly  independent.  In  fact,  study  shows  that  a 
systematic  error  estimated  to  be  as  large  as  15  ppm  could  be  present. 
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Thus,  the  experiments  show  that  the  probable  error  for  measurements  of 
g  by  reversible  pendulums  is  about  2  ppm,  There  is  already  some  preliminary 
evidence  based  upon  measurements  by  other  methods  that  these  measurements  do 
in  fact  have  an  error  of  about  10  parts  per  million  from  the  true  value. 

Here  at  NBS  two  of  our  scientists  C.  H.  Page  and  D.  R.  Tate  are  now 
designing  new  experiments  to  get  at  the  answer  by  methods  which  differ  in 
principle  from  the  older  ones. 

They  will  use  a  quite  different  type  of  pendulum  and  also  will  time  a 
freely  falling  object ,  falling  in  vacuum.  They  are  making  every  effort  to 
design  the  experiment  to  eliminate  known  sources  of  error ,  to  have  eaoh  error 
subject  to  experimental  estimation  or  below  the  deBired  limit  of  accuracy 
(about  1  part  per  million)  and  to  take  advantage  of  the  use  of  statistical 
variation  of  parameters  in  the  experiment  itself.  They  are  working  closely 
in  their  work  with  our  Statistical  Engineering  Section  to  get  the  benefit 
of  their  advice  in  the  design  phases  of  the  experiment  instead  of  waiting 
until  the  data  is  in  as  is  all  too  often  done* 

Unless  one  has  had  an  opportunity  to  participate  in  one  of  these 
precision  measurements!  it  is  difficult  to  understand  the  coiiapleXities 
that  arise.  In  pendulumsj  the  motions  cause  bending  and  stretching, 
minute  temperature  changes  cause  changes  of  length,  wear  changes  the 
form  of  the  bearings,  even  stray  electric  and  magnetic  fields  cause  signi¬ 
ficant  perturbations.  In  the  free  fall  experiment,  mention  of  only  one  of 
many  difficulties  indicates  the  kind  of  factors  that  must  be  considered. 

One  assumes  that  the  laboratory  is  at  rest  on  the  earth  during  the  time  the 
object  falls.  This  is  not  strictly  true.  Due  to  minor  earthquakes,  micro¬ 
seisms,  the  laboratory  does  not  stay  at  rest  with  the  precision  needed.  It 
is,  therefore,  necessary  to  set  up  a  seismograph  and  record  the  mioroseisms. 
The  free  fall  can  then  be  made  during  quiet  periods  and  corrections  can  be 
made  for  the  motion  of  the  laboratory  during  the  fall.  These  motions  may 
be  as  small  as  1*0  millionths  of  an  inoh  but  they  are  still  significant. 

It  is  the  consideration  of  the  whole  array  of  suoh  errors  and  the 
design  of  experiment  to  take  account  of  them  that  makes  precision  measure¬ 
ment  such  a  fascinating  scienoe  and  one  which  depends  very  strongly  upon 
proper  design  of  experiment. 
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RECENT  RESEARCH  IN  STATISTICAL  PROBLEMS 
IN  SUBJECTIVE  TESTING1*2 

Ralph  Allan  Bradley 

Virginia  Agricultural  Experiment  Station 
of  the  Virginia  Polytechnic  Institute 

1.  INTRODUCTION .  There  is  widespread  interest  in  the  design,  oonduot, 
and  analysis  of  experiments  Involving  the  subjective  opinions  of  samples 
and  panels  of  individuals.  Applications  arise  in  food  processing,  photo¬ 
graphy,  distilling  and  brewing,  textile  research,  wood  technology, 
petroleum  produots  researoh,  and  in  a  host  of  other  areas  of  researoh. 

Problems,  many  of  which  at  least  have  statistical  aspects,  arise  in 
the  selection  of  consumer  samples  and  expert  taste  panels,  in  the  training 
of  panel  members,  in  the  design  of  experiments,  in  the  development  of 
scoring  soales,  and  in  the  analysis  and  interpretation  of  experimental 
data.  We  shall  present  the  results  of  recent  researoh  and  Illustrative 
examples  on  techniques  that  deal  with  the  sensitivities  of  scoring  soales, 
the  variabilities  of  judges  using  scoring  scales,  the  design  of  experi¬ 
ments  with  sooring  soales,  and  the  design  of  ranking  experiments . 

We  shall  not  here  dlsouss  in  any  detail  the  selection  or  training  of 
a  taste  panel,  the  selection  of  a  oonsumer  panel,  or  the  development  of  a 
scoring  soale.  Some  general  discussion  of  the  problems  involved  are 
given  in  the  reference  (Bradley  D-9533  )  which  has  a  large  classified 
bibliography  including  papers  on  these  subjects.  Expert  taste  panels  are 
usually  selected  through  use  of  a  system  of  triangle  tests  (a  triangle 
test  involves  the  selection  of  the  odd  Sample  from  three  samples  of  whloh 
two  are  identical).  In  the  cited  reference,  we  illustrate  the  use  of 
sequential  triangle  tests.  Hopkins  and  Grldgeman  (195$)  oompare  the 
sensitivities  of  paired  and  triad  flavor  intensity  difference  tests. 

Kramer  (1955*  1956)  has  provided  tables  and  discussions  on  the  use  of 
multiple  matching  systems  for  the  selection  of  judges  as  an  alternative 
to  use  of  triangle  tests.  Procedures  for  the  selection  of  a  oonsumer 
panel  should  basically  depend  on  sampling  survey  techniques  and  those 
used  in  opinion  polls.  In  such  studies  it  is  well  to  keep  the  techniques 
simple  and  paired-sample  preference  tests  are  usually  used  along  with  a 
supplementary  questionnaire.  Ranking  techniques  in  paired  comparisons 
may  be  used  in  these  surveys  and  the  method  is  summarized  in  a  subsequent 
section.  There  are  many  psychological  aspects  to  the  development  of  a 
scoring  scale  and  we  shall  not  discuss  them  here.  When  a  scale  is 
developed,  the  distributions  of  scores  on  the  soale  should  be  examined. 
Hopkins  (1950)  considered  such  distributions. 


1.  Presented  at  the  19JJ6  Gordon  Conference  on  Statistics  in  Chemistry 
and  Chemical  Engineering,  New  Hampton,  N„  H.,  August  23,  1956. 

2.  A  report  based  largely  on  research  sponsored  by  the  Agricultural 
Research  Service,  U»  S„  Do  A.,  under  a  Research  and  Marketing  Act 
Contract,  No.  12-11; -100-126(20) . 
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In  the  following  sections  we  shall  use  the  notation  of  the  various 
basic  reference  papers  rather  than  maintain  a  more  uniform  notation  in 
this  paper.  This  should  penult  the  reader  to  more  easily  associate  our 
examples  with  the  theory  in  the  references. 

2.  SENSITIVITY  COMPARISONS ,  In  the  development  of  scoring  scales, and 
other  experimental  techniques,  it  is  often  desirable  that  two  alternative 
methods  be  oompared.  Cochran  (191*3)  discussed  the  comparison  of  different 
scales  of  measurement  for  experimental  results  and,  indicated  where  further 
research  was  required.  We  have  provided  means  of  comparing  the  sensitivi¬ 
ties  of  similar  experiments  in  two  recent  papers  (Schumann  and  Bradley 
119563,  Bradley  and  Schumann  [19563).  This  reoent  research  permits  a 
test  on  the  equality  of  the  parameters  of  nop-qentrality  of  F-dietribu- 
tions  associated  with  tests  of  treatment  equality  in  two  Independent  but 
parallel  experiments  containing  the  same  set  of  treatments  in  identical 
experimental  designs.  The  experiments  may  differ  in  the  scoring  scale 
used  or  in  some  other  criterion  of  measurement  that  does  not  interact 
with  treatments.  Qood  experimental  data  to  illustrate  the  method 
appeared  as  this  paper  was  in  preparation. 

Kauman,  Qottsteln,  and  Lantlcan  (1956)  were  lntereeted  in  the 
quality  evaluation  of  dried  veneer.  Two  schemes  were  used  to  evaluate 
quality  of  sheets  of  veneer  and  they  are  designated  as  "numerical1*  and 
"subjective"  although  both  were  somewhat  subjective.  In  the  numerical 
scheme  various  types  of  degrade  were  listed  with  numerical  scores  for 
the  severity  of  the  degrade  and  weights  were  given  for  use  in  combining 
degrade  scores  to  obtain  a  quality  score.  A  quality  rating  of  50  in 
the  numerical  scheme  was  very  bad  and, the  maximum  possible  score j  a 
quality  rating  of  0  was  excellent  and  indicates  a  sheet  free  from 
degrade.  In  the  subjective  scheme  "quality  ratings"  were  assigned  on  a 
0-8  aoale  with  0,  excellent  and  8,  very  bad.  Twenty  selected  sheets  of 
veneer  were  evaluated  by  three  observers,  twioe  with  each  soheme,  and 
repeat  observations  were  spaced  by  several  days  with  the  order  of 
presentation  of  the  sheets  ohanged.  The  complete  tables  of  scores 
are  given  in  the  reference;  we  repeat  the  analyses  of  variance  in 
Table  1. 


Table  1 

Analyses  of  Varianoe  for  Quality  Ratings** 


Numerical  scheme  Subjective  scheme 


Faotor 


eets 

(8 

carve 

rs 

petit 

io 

terao 

ti 

n 

SR 

Degrees  of 
freedom 


Sum  of 
squares 


19 

12826.16  1 

2 

170.72 

3 

168.13 

38 

823.61 

57 

59507 

A  reproduction  of  nart  of  Table 
page  lUS. 


m 


Mean 

square 


75.1 

850 

56.0 

21.6 


io.U5 


Sum  of 

squares 


336.90 

3.70 

0.61 

30.12 

22.13 


Mean 

square 


.73 
,.852 
' . 20^2 
1.7928 
'0881* 


auman,  Qott stein,  and  Lantican  (1956), 
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While  the  authors  of  the  cited  reference  properly  considered  Model  II 
of  the  analysis  of  variance  and  estimated  variance  components)  we  shall 
illustrate  how  to  apply  a  test  of  the  sensitivities  of  the  two  experiments 
conditional  on  the  observers  and  samples  actually  used  in  the  experiments 
and  assume  Model  I  of  analysis  of  variance  with  "fixed*  effects.  Under 
these  conditions)  the  expected  value  of  the  mean  square  for  sheets  is 

t 

(1.2)  e[m.s.(s)3-  o2  ♦  k£r?/(t-i) 

i-l  1 

is  general  for  t  sheets  and  k  observations  on  each  sheet.  T  i  is  the 
"effect"  of  sheet  i,  i  ■  1,  . ..,t.  In  the  examples, 

20 

(2.2)  E[M.S.(a)l-  +  6^r^/i?. 

a 2  is  the  expectation  of  the  error  mean  square  in  both  (1.2)  and  (2.2). 

The  parameter  of  non-centrality  of  the  P-test  for  sheets  is,  in  general, 

*  „  „ 

(3.2)  K  ■  k  fcrJAo2 

i-l  1 

and,  in  the  examples, 

20 

«|.2)  A.  -  6  Zll/l# 

i-l  1 


whsn  the  F-density  is  written 

(5.2)  f(F)  -  (a/b)a[B(a,bj]“W“1(l+aF/b)“  (a+b> 

.  i*i  [a+b,  a,  aAF/b  (l+aF/b)3  ,  0  £  F  <  00 

where  F  has  2a  and  2b  degrees  of  freedom,  j_F^  is  the  confluent  hyper¬ 
geometric  series,  and  B  represents  the  beta  Function.  It  is  seen  at 
once  that  A*  is  &  parameter  expressing  the  magnitudes  of  treatment 
effects  in  a  scale  in  terms  of  the  experimental  error  associated  with 
the  soale.  A.  Is  the  appropriate  parameter  to  measure  the  sensitivity  of 
a  soale.  We  shall  test  the  hypothesis,  H0t  A. 2  -A-g,,  against  the  alter¬ 
native,  / X.g,  using  the  subscripts  1,  for  the  numerical  scheme, 

and  2,  for  the  subjootive  scheme. 

To  apply  the  test,  we  compute  the  two  F-ratios  with  19  and  57 
degrees  of  freedom  (Now  a  ■  9*5,  b  ■  28.5.)  and  obtain 

F^  ■  61;, 60  and  Fg  ■  1*5.65. 


The  statistic  used  is 

(6.2)  w  -  F1/F2  ■  6J*,60/i;5.65  ■  l.i|2. 
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The  distribution  of  w  under  H0  depends  on  X  "Ai  "A-g  which  in  general  is 
unknown.  In  practice  it  ie  clear  that  the  test  is  not  very  sensitive  to 
small  changes  in  K  and  we  in  fact  estimate  A.  from, the  data  using 

(7.2)  Xi  -  »<*i-l>,  1  “  1*  2. 

In  the  examples , 

\1  «  9.5(61^.60-1)  •  60li.2 
and 

A.  2  -  9.50*5.6$- 1)  -  1*21*.  2. 

A  ^  A 

We  take  A.  to  be  the  average  of  A]_  and  A2# 

(8.2)  K  -  im.2  *  i*2l*.2)  -  $ll*,2. 

A  table  of  values  w0  such  that  P(w>  w0|H0)  ■  0.0$  is  given  by  Bradley  and 
Schumann  in  the  cited  references.  To  enter  this  table,  one  requires 

(9.2)  a'  -  (a+*)2/(a+2A)  -  (9.$+$ll*.2)2/(9.$+1028.1*) 

•  26U.2 

and  b  ■  28.5.  The  table  is  symmetric  in  the  sense  that  w0(a',b)  »  v0(b,a') 
and  we  obtain  w0  » 1.8$  by  consulting  the  table.  Now  Ha,  as  postulated, 
is  two-sided  and  hence  the  significance  level  being  used  is  0.10.  w  in 

(6.2)  does  not  exoeed  w0  and  consequently  we  do  not  rejeot  H0  at  the  IQ* 
level  of  significance.  We  are  in  aooord  with  the  authors  (Kauman  et  al.) 
who  state  "the  present  experiment  has  shown  that  the  subjective  evaluation 
oan  yield  results  of  an  accuracy  approaching  that  of  the  numerical  scheme, 
although  the  accuracy  of  the  latter  was  slightly  superior". 

The  theory  of  the  test  of  sensitivity  is  given  in  detail  by  Sohumann 
and  Bradley  (1956)  and  other  applications  are  given  by  Bradley  and  Sohumann 
(1956a  As  a  somewhat  different  application,  the  method  may  also  be  used 
to  compare  values  of  R2,  the  square  of  the  multiple  correlation  coefficient, 
for  two  similar  but  independent  regression  studies  based  on  the  usual 
regression  model.  The  theory  involves  an  approximation  which  appears  to 
be  good.  The  distribution  of  w  should  come  from  the  joint  distribution  of 
two  non-oentral  variance-ratios  with  equal  pairs  of  degrees  of  freedom  and 
equal  parameters  of  non-centrality.  What  was  done  was  to  approximate  to 
the  non-central  F-distributiona  using  central  F-distributions  and  to  obtain 
the  distribution  of  w  taking  w  to  be  the  ratio  of  two  independent  central 
F-variates. 

Applications  are  limited  since  a  table  is  only  available  for  a  one¬ 
sided  level  test,  Schumann  is  preparing  additional  tables. 
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3.  JUDQE  VARIABILITY  AND  JUDGE  COMPARISONS.  When  items  are  scored  in 
subjective  experimentation,  there  is  no  knowledge  of  the  "true  worth"  of 
the  sample  in  the  units  of  the  scoring  scale.  It  is  then  difficult  to 
assess  the  judging  ability  of  a  judge,  Russell  and  Bradley  (19!>6)  have 
provided  means  of  estimating  the  variability  of  a  judge  in  terms  of  the 
deviations  of  his  scores  for  an  item  from  those  of  the  remaining  judges 
but  permitting  a  judge  a  possible  constant  bias  in  his  assignment  of 
scores.  Similar  procedures  were  considered  by  Grubbs  (19i|3;  and  Ehrenberg 
(1950)  and  they  obtained  the  same  estimators  from  somewhat  different 
demonstrations  but  did  not  develop  the  teat  procedures  illustrated  below. 

Consider  a  two  way  classification  with  t  items  or  treatments  and  r 
judges.  The  model  with  fixed  effects  is 

d.3)  yAj  -  y-  +  Tt  i  ■  j  -  l,...,r 

where  y-M  is  the  score  assigned  by  the  jth  judge  to  the  i^h  item,  3a  is 
the  grand  mean,  the  average  level  of  judging,  n  is  the  effect  of  the  i^h 
item, is  the  effect  (or  bias)  of  the  jth  judge,  and^ij  are  independent 
normal  variates  with  zero  means.  Contrary  to  the  usual  model  of  analysis 
of  variance,  we  admit  the  possibility  of  heterogeneous  error  variances  in 
the  sense  that 

(2.3)  E(*ij2)  "  <$> 

cdj  is  the  variance  of  the  jth  judge  and  is  to  be  estimated. 


The  estimator  of  c^j  to  be  used  is 


(3.3) 

4?  -  rGJ  E 

3  U-lKr-57  "  Tt-ITCr-ITTr-lT 

where 

a.3) 

Qj  -  I  (yij-yi.-y-j+y..)2 

and 

(5.3) 

i-1 

t  r 

e-5^  ^(yij-yi.-y.j^-..)  » 

the  latter  being  the  error  sum  of  squares  from  the  analysis  of  .variance 
of  the  two-way  olassifioation.  d?  is  an  unbiased  estimator  of  but, 
like  an  estimate  of  a  varianoe  component,  may  occasionally  be  negative. 
In  (U . 3 )  and  (5*3),  y^.  is  the  average  of  scores  for  treatment  i,  y, j  is 
the  average  of  scores  assigned  by  the  j^  judge,  and  y,,  is  the  average 
of  all  scores.  The  requirement  that  in  (1,3)  be  normal  is  only  met 
approximately  }.n  use  of  a  discrete  scoring  scale  but  does  not  affect  the 
estimation  of  d?.  In  later  paragraphs  of  this  section,  we  shall  assume 
that  departures^ rom  non-normality  do  not  seriously  affect  our  test 
procedures. 
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We  shall  again  illustrate  this  work  using  the  data  of  Kauman  et  al. 

The  detailed  example  is  for  Test  1  using  the  subjective  scheme.  Scores 
are  listed  in  Table  2.  In  Table  3  we  show  values  of  (yi,  +  y,j-y.,)  ob¬ 
tained  by  first  writing  down  the  marginal  entries  and  then  computing  the 
required  table  ontries.  In  Table  it  we  have  the  residuals,  (yiJ-yi.-y.J+y, . ) 
obtained  by  subtracting  entries  in  Table  3  from  corresponding  entries  in 
Table  2,  Values  of  Oj  and  E  are  given  in  the  lower  margin  of  Table  k  and 
are  obtained  by  accumulating  the  squares  of  entries  in  the  columns  above  as 
required  in  view  of  r  was  so  obtained.  The  values  of 

E  -  i  CL  3 

computed  using  (3.3)  are  listed  in  Table  5  along  with  those  for  the  other 
three  tests  of  Kauman  et  al.  To  illustrate  the  computations,  we  use 
observer  A  and  obtain 


22.50 

xsmm 


0.72. 


Certain  cheoks  on  the  computation  are  possible.  The  residuals  in 
Table  4  have  row  and  column  totals  that  are  aero  except  for  rounding. 
Also,  as  already  noted,  r  and  E  will  usually  have  been  obtained 

£  O-i  -  E 
j-1  J 

directly  from  the  analysis  of  varlanoe.  A  final  oheok  follows  from  the 
fact  that 


In  the  example, 

ft-Pfe-il  £$2  .  [(o.72)  *  (0.53)  *  (0.53)3  ■  22.55. 

J-1  J 

A  test  of  homogeneity  of  variances  is  dob Bible  only  when  r  -  3* 

The  only  situation  wherein  the  estimators  o?  of  d?  are  maximum  likeli¬ 
hood  estimators  is  when  r  ■  3  and  then  an  approximate  test  may  be  made. 
Consider  the  hypothesis, 

H0i  -  d|  ■  c^, 

and  the  alternative, 

H*1  tfJ  ^  ^k  f0r  som®  ^  and  k*  ^  3» 

The  likelihood  ratio  test  statistic,  distributed  approximately  as 
^-variate  with  2  degrees  of  freedom  for  large  samples,  is 

(7.3)  %\  ■  -(2.3026) (t-l)C2  log(t-l)+log(^^+  d^+  <^) 

-2 log  E+log  it/3  3 

•  -(2.3026)(19)[2  log  19+log  [(O.72)(O.53)*(0.72)(O.53)+(O.53)(O.53)] 

-2  log  22.50+log  it/3]  ■  0.1U. 
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Table  2 

Quality  Ratings  for  the 
Subjaotlva  Quality  Evaluation 
Teat  1* 


- r 

1 

Observers 

HI 

iia mm 

B 

a 

l 

3 

3 

3 

» 

•  3.00 
t 

2 

7 

5 

7 

•  6.33 

1 

3 

6 

5 

5 

*  5.33 

t 

4 

7 

8 

7 

*  7.33 

t 

5 

1 

2 

3 

♦  2.00 
f 

6 

5 

5 

5 

»  5.00 
t 

7 

5 

6 

5 

»  5.33 

t 

8 

3 

5 

4 

»  4.00 
t 

9 

4.5 

5 

5 

»  4.83 
, 

10 

6 

7 

7 

*  6.67 

t 

11 

5 

4 

4 

»  4.33 

t 

12 

8 

7 

8 

•  7.67 
| 

13 

5 

7 

5 

♦  5.67 

i 

14 

1 

2 

2 

»  1.67 

t 

15 

7 

7 

7 

»  7.00 
t 

16 

1 

3 

4 

•  2.67 

f 

17 

4 

3 

3 

»  3.33 

t 

Id 

6 

6 

5 

»  5.67 
| 

19 

5 

5 

7 

»  5.6? 

| 

20 

3 

3 

2 

»  2.67 

Table  3 

Values  of  (yj.+y. j-y*. )  for  the 
Subjective  Quality  Evaluation 
Test  1 


*  From  Table  3#  Kauman,  Qottatein,  and  Lantican  (1956),  page  135 


$$ 

v.y.-i 
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The  multiplier,  2.3026,  in  (7.3)  is  included  so  that  common  logarithms  may 
be  used  in  the  computation  of  The  small  value  of  yl?  indicates  that  the 

observers  may  be  taken  to  have  homogeneous  variances. 

In  Table  5.  we  have  included  values  of  for  all  four  tests  and  show 
also  values  of  o^,  the  error  mean  square  from  the  analysis  of  variance. 

Note  that  only  in  one  of  the  numerical  tests  was.J^  significant  at  the  % 
level  of  significance.  The  estimates  of  variance  in  the  numerical  scheme 
are  considerably  larger  than  in  the  subjective  scheme.  This  does  not  of 
course  suggest  a  preference  for  the  subjective  scheme  but  is  merely  a 
result  of  the  scales  used  in  the  scoring  methods.  Ihe  appropriate  method 
of  comparing  the  scales  is  the  one  given  in  the  preceding  section. 

Table  5 

,2 

Estimates  of  Variance  and  /•  2  to  Test  for 
Homogeneity  of  Observer  Variances  for  All 
Four  Tests  of  Kauman  et  al. 


Tests 

Observer 
Variances,  c 

2 

'j 

Error  Mean 
Square,  '0% 

X! 

A 

B 

C 

Subjective  Test  1 

0.72 

0.53 

0.53 

0.59 

0.14 

Subjective  Test  2 

O.46 

0.58 

0.74 

0.59 

0.26 

Numerical  Test  1 

4.58 

10.79 

33.02 

16.13 

6.42* 

Numerical  Test  2 

2.19 

26.40 

21.69 

16.84 

4.14 

Observer  A  was  the  only  observer  with  previous  experience  in  judging 
veneer  except  for  brief  training  sessions  before  the  experiment  began. 
Another  test,  and  this  is  an  exact  test,  is  possible.  Consider  the  null 
hypothesis 

2  2  2  2  2 

H  :  a.  *»  o  ,  given  a0  -  ...  =  a  =  a  , 
o  jl  <-  r 

and  the  alternative, 

2  2  2  2  2 

H  :  a.  <  o  ,  given  at  =  ...  =  a  -  o  . 
a  x  ^  r 

the  statistic  used  is 


with  (t-l)  and  (t-l)^-2)  degrees  of  freedom.  H  may  have  either  of  the 
possible  one-sided  forms  or  be  two-sided.  For  athe  form  of  H  shown, 
small  values  of  F  are  significant.  In  the  example,  there  is  no  point  in 
testing  H  versus  in  this  test  in  view  of  the  homogeneity  of  variances 
demonstrated  above.  However,  we  shall  proceed  in  order  to  illustrate  the 
method. 
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"adaption",  the  effect  of  the  presence  of  one  treatment  on  another  in  the 
same  incomplete  block.  A  doubly  balanced  incomplete  block  design  is  one, 
which  in  addition  to  being  balanced,  has  all  triplets  of  treatments  appear¬ 
ing  in  incomplete  blocks  an  equal  number  of  times .  The  use  of  doubly 
balanced  incomplete  block  designs  permitted  easy  evaluation  of  the  addi¬ 
tional  parameters  inserted  in  the  linear  model.  Galvin's  model  is 


(2.4)  yM 


+  *' 
h  '  i 


n,  .m.  .a.  .  +  ) 

hj  xo  ni 


where 


y.  .  is  an  observation  or.  treatment  i  in  block  h, 
hi 

n^  *  1  if  treatment  i  occurs  in  block  h 
=  0  otherwise, 


represents  the  average  level  of  scoring, 

3^  represents  the  effect  of  block  h  (perhaps  due  to  the  taster  doing 
the  scoring,  the  time  of  day,  etc.), 


7'i  is  the  effect  of  treatment  i, 
BUj  =  1  if  i  <  j 

-  -1  if  j  <  i. 


a.  .  is  the  effect  of  the  presence  of  treatment  j  on  treatment 
ij 

i  (a.  .  =  -a . . ) ,  and 
ij  Ji  * 

e.  .  is  equivalent  to  e.  in  (1.4)  above, 
ni  ijtc 

Calvin  called  the  effects  measured  by  a.  the  correlation  effects.  We 
shall  not  give  examples  of  analyses  usinj  either  the  Scheffe" or  the  Calvin 
designs  here  but  instead  refer  the  reader  tc  the  references  for  such 
examples . 


Factorial  treatment  combinations  are  often  required  in  subjective  test¬ 
ing,  for  food  samples  may  result  from  a  variety  of  process  changes  in  their 
manufacture  as  may  photographic  samples,  dye  samples  and  the  like.  Means  of 
incorporating  factorial  treatments  in  incomplete  block  designs  are  then  re¬ 
quired.  That  this  may  be  done  in  balanced  incomplete  block  designs  seems  to 
be  well  known  although  we  Iv.ve  not  found  a  direct  reference.  Kramer  and 
Bradley  (1956,  1956a5  have  shown  how  to  use  factorials  in  group-divisible, 
two-associate  class,  partially  balanced,  incomplete  block  designs.  We  shall 
give  an  example  here  and  rote  that  additional  examples  are  given  in  the 
references  along  with  the  theory.  We  use  only  an  intra-block  analysis  of 
variance;  Walpole  at  the  Virginia  Polytechnic  Institute  is  considering  inter¬ 
block  analyses.  Kramer  is  also  considering  extensions  to  other  types  of  two- 
associate  class,  partially  balanced,  incomplete  block  designs. 
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A  group-divisible,  two-associate  class,  partially  balanced,  incomplete 
block  design  has  design  parameters  as  follows: 

v:  the  number  of  treatments  or  varieties, 
n  the  number  of  observations  on  eaoh  treatment, 
kt  the  number  of  units  in  an  incomplete  block, 
b:  the  number  of  incomplete  blocks, 
m:  the  number  of  groups, 

n:  the  number  of  treatments  in  a  group,  v  -  mn, 

where  treatments  in  the  same  group  are  first  associates  and  treatments  are 
not  in  the  same  group  are  second  associates. 

X. :  the  number  of  times  two  first  associate  treatments  appear  together 
in  incomplete  blocks,  and 

X2:  the  number  of  times  two  seoond  associate  treatments  appear  together 
in  incomplete  blocks.  For  these  designs,  the  treatments  may  be  given  in  an  m 
by  n  reotangular  association  soheme.  Bose,  Clatworthy,  and  Shrikhande  (1954) 
have  catalogued  all  known  designs  of  this  class  with  blook  size ,  3  <  k  <  10 
and  r  <  10,  We  consider  an  example  with  v  -  8,  r-3»  k  -  3,  b  ■  8,  m  ■  4» 
n  ■  2,”A_  *0,  K  ■  1  designated  as  Design  R5  of  the  catalogue.  This  is  a 
made-up  example  as  no  data  were  available. 

For  the  example,  we  have  the  baslo  association  soheme  of  Table  6  where 
treatments  in  the  same  row  are  first  associates  and  we  use  a  double  sub¬ 
script  notation  to  designate  treatments  and  the  symbol  V.  In  Table  7  we 
show  the  association  scheme  for  a  4  by  2  factorial  with  an  A-faotor  at  four 
leyele  and  a  C-faotor  at  two  levels*  The  design  lay-out,  observations, 
blook  totals  B  and  grand  total  0,  are  given  in  Table  8.  The  treatments  in 
Table  8  are  associated  with  the  factorials  through  the  correspondence  of  items 
in  Tables  6  and  7. 


if  4  .0  •  -~V#  . 4|h 

JW  vW-C’v'i  >/’•■  \  >’v  v'Uv"“l  i  \  ■  ■■  \  -  vX  o' ■ >■£ 

' 


:r''v 
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Table  6 


Table  7 


Association  Scheme  for 
8  Treatments 


Association  Scheme  for 
the  4x2  Factorial 


Tu 

V12 

V21 

Taa 

*31 

*32 

TU 

Design  and.  Observations  for  the  Eight  Treatments 


Blocks  Observations  B 

s 


1 

V11 

V21 

V41 

98 

35 

24 

39 

2 

r-i 

V31 

vi2 

135 

42 

45 

46 

3 

V 

V 

41 

V22 

45 

13 

15 

17 

4 

V 

T12 

732 

66 

19 

22 

25 

5 

V12 

v22 

V 

89 

28 

30 

31 

6 

V22 

V32 

V11 

157 

51 

52 

54 

7 

V32 

v« 

V21 

192 

60 

65 

67 

8 

V 

vn 

V31 

169 

54 

57 

58 

Total  0  -  951 


•  *  <*  ■*  ■*#  *  « _ • 


v.;\. 


■  *■  A 


Ui,, 
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The  basic  analysis  of  variance  without  consideration  of  the  factorial 
effects  is  straight-forward.  The  total  sum  of  squares  and  the  block  sum  of 
squares  are  computed  in  the  usual  way.  We  find  it  useful  to  compute  the 
adjusted  treatment  sum  of  squares  from  the  estimates  of  treatment  effects. 

The  linear  model  is 


(3.4) 


*  4  TU  *  * 


where  y. .  is  the  observation  on  V.  if  that  treatment  is  in  block  s,  u  is 
the  oveiNEll  average,  j'..  is  the  effect  of  V.  , ,  (3  is  the  effect  of  block  s, 
and  e.  .  is  the  error  variate  as  described  atter  fl.4).  If  t.  .  is  the 
estimlrcSr  of  ^^y  in  general,  J 

<*•*>  hj  -  IkTVirk(vxi^TirTVij/(vAi)fijJ/TX2(vrk-r) 

and,  in  the  example, 

(5.4)  t,  .-IT,  -  1  ET.  .-IB...  •+  1.  SB.  .  . 

1J2y  r  i  is  ) 1J- 

T,  .  is  the  total  for  V.  •  B  is  the  total  of  block  totals  of  those  blocks 

containing  V,  Values ‘'of  TT”*  ET.  , ,  B . ,  ,  £B.  .  , 

j  ^  J  J*  J  ^  J  * 

and  t^  are  given  in  Table  9  in  positions  corresponding  to  the  array  of 

Table  6.  In  addition,  in  Table  9,  we  show  the  totals,  t.  =£t . , s  t  .=£t.  . 

1‘’  J  D  •  J  ^  3.J » 

and  the  averages,  t.  »  t.  /n  and  t  .  =  t  ./m.  The  adjusted  treatment  sum 

l .  l .  .  j  •  j 

of  squares  may  in  general  be  written  as 

(6.4)  AdJ. Treat. S.S.  =  ( \^k-r)  St2  +  Et2 

k  ij  1J  k  i  lo 

and  here  becomes 

(7.4)  AdJ. Treat. S.S.  =  2EEt?,  +  1  2t.2  =  49.54. 

ij  3  i 
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DURHAM,  NORTH  CAROLINA 


IN  RCRLV 
RKPKR  TO 
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Dr*  I*  R.  Hershner,  Jr. 

Research  and  Development  Field 

Offioe  of  the  Ohief  of  Reeearoh  and  Development 

Fort  Belvoir,  Virginia 

Dear  Rays 

Following  the  Second  Conference  on  the  Design  of  Experiments 
in  Army  Reeearoh,  Development  and  Testing,  copies  of  most  of  the 
papers  presented  at  the  meeting  were  collected  from  the  authors* 
This  group  of  articles  has  now  been  published  in  the  Proceedings 
of  the  afore  mentioned  conference,  and  we  are  inclosing  a  copy 
for  your  use* 

i* 

Sinoerely  yours, 


1  Incl 
Proceedings 


Assistant 

Mathematical  Soienoes  Division 
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Table  9 


Values  of  T 


ET 


B. 


ij,  ru,  i3 


EB. 


13- ’ 


and  t 


13 


1 

13 

ET 

P3 

Bij  * 

‘13 

Totals 

Averages 

*1. 

146 

98 

244 

424  290 

714 

1.958  0.292 

2.250 

133 

98 

231 

425  291 

716 

"3-854  0.979 

-2.875 

1 

116 

137 

253 

349  415 

764 

"0,063  "0.562 

-0.625 

I 

|  73 

150 

223 

209  450 

659 

1.458  -0.208 

1.250 

| 

• 

Totals  t  4 
•  J 

-0.501  0.501 

0.000 

0.000 

Averages  t  . 

•  j 

-0.125  0.125 

To  complete  the  basic  analysis,  we  have 

(8.4)  Unadj. Block  3.3.  -  E  B*A  “  <*2/rv  -  6384-95, 

s-1  8 

(9.4)  Total  S.3.  ■  EEEy? .  -  02/rv  ■  6593-62, 

13s  1JS 

and  the  error  sum  of  squares  is  obtained  by  subtraction, 

(10.4)  Error  S.S.  -  Total  S.’S,  -  Unadj. Block  $,S, 

-  Ad j, Treat,  S.S. 

-  159.13. 

Degrees  of  freedom  are:  Treatments,  (v-l)  ■  7;  Blocks,  (b~l)  ■  7} 

Error,  [v(r-l)-b+lj  -  9;  Total,  (rv-l)  -  23.  The  analysis  of  variance  is 
given  in  Table  13. 

To  consider  the  analysis  for  the  4  by  2  factorial  of  Table  7,  we  need 
only  partition  the  adjusted  treatment  sum  of  squares  into  adjusted  sums  of 
squares  for  A-factor,  C-factor,  and  AC-interaction.  This  is  easily  done  and 
the  basic  formulas  are 
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(13.4)  AdJ.  AC-interaction  S„S.  -  K-.IE(t,  ,-t.  -t  .)2 

1J  1*  ej 

where  ■  (X^+rk-r)/k,  K2  «  (\2-X^)/k.  Usually  we  compute 

AdJ.  AC-Interaction  S.S.  ■  AdJ. Treat. S.S.  -  AdJ.  A- factor  3..S. 

( 

-  AdJ.  C-f actor  S.S. 

In  the  example,  *  2,  K2  -  1/3,  m  *  4,  and  n  ■  2.  Then, 

(14.4)  AdJ .  A-factor  S.S.  -  £t2  -  20.37, 


(15.4) 


AdJ.  A-factor  S.S,  «  16  Etf  ■  20.37, 

3  i 

AdJ, ,  C-f  actor  S.S.  e  8£t2.  -  0.25, 


(16.4)  AdJ.  AC-Interaction  S.S.  -  2E£(t,  ,-t.  -t  J2  -  28.92. 

ij  i.  »j 

Single  degree  of  freedom  comparisons  may  be  used.  Consider  linear, 
quadratic,  and  cubic  trends  over  the.  levels  of  the  A-factor  and  their  inter 
actions  with  the  C-factor.  This  is  done  in  much  the  usual  way  except  that 
additional  and  different  multipliers  are  required  for  componSntb  of  the 
A-factor,  C-factor  and  AC-interaction  sums  of  squares.  The  method  will  be 
evident  from  Table  10  but  to  illustrate  we  consider  the  Linear  A-Oomponent, 
The  linear  contrast  for  Linear  A  is 

L(lin.A)  M  -3(1.958)-3(0.292)  ♦  ...  ■*•  3(~0.206)  -  -0.750. 

The  sum  of  squared  coefficients  is 

A(lin.A)  -  (-3)2  *  C-3)2  ♦  ...  ♦  (3)2  -  40. 

The  multiplier  is,  in  general  for  a  component  of  the  A-factor, 

M(lln.A)  -  (nKj+AgJ/n  -  8/3; 

the  adjusted  sum  of  squares  for  the  linear  A-factor  component  is 
AdJ. S.S.  (Lin. A)  -  ^  M(lin.A)  -  ^Oi75,oA$/3)  .  0.O4. 

In  general,  the  multiplier  for  a  component  of  the  C-factor  and  for  a 
component  of  the  AC-interaction  is  itself. 

Now  we  are  not  restricted  to  a  two-factor  factorial  but  in  general 
may  have  several  factors  with  levels  and 

P  d 

n, ,...,n  so  long  as  1 C  ra -  m  and  ) C  n.  ■  n.  Suppose  the  A-factor 
1  q  i-1  1  J«1  3 

were  in  fact  a  2x2  factorial  itself.  If  we  designate  these  new  factors 
as  N  and  P  and  associate  them  with  the  association  schemes  of  Tables  6 
and  7  as  given  in  Table  11,  we  can  analyze  th'~  experiment  as  a  2*x2 
factorial  subdividing  the  adjusted  treatment  sum  of  squares  as  in  Table  12. 
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Table  11 


Association 

2 

2x2 

Scheme  for  the 

Factorial 

N1P1C1 

N1P1C2 

n1P2°1 

N1P2°2 

n2pici 

N2PiC2 

N2P2C1 

N2P2C2 

The  analysis  of  variance  for  the  various  breakdowns  of  the  experimental 
data  considered  is  given  in  Table  13. 

We  believe  that  these  designs  with  factorial  treatment  combinations 
offer  a  useful  aid  in  subjective  experimentation.  The  analyses  are 
reasonably  simple  and  straight-forward  and  out  of  the  many  such  designs 
catalogued  it  should  be  easy  to  select  one  appropriate  for  the  planned 
research.  Other  applications  in  many  fields  of  experimentation  should  be 
forthcoming. 
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Table  13 


Intra-Block  Analysis  of  Variance  for  the  Illustrative  Experiment 


Treat. (ad 


Subdivision  for  A  by  2  Factorial 


Subtotals 


Subdivision  for  Trends  in  A.  by  2  Factorial 


Subtotals 


Subdivision  for  2  by  2  Factorial 
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5.  RANKING  METHODS  FOR  SUBJECTIVE  TESTING.  We  have  discussed  statistical 
methods  for  subjective  testing  for  use  with  scoring  scales  up  to  this  point. 
It  is  our  opinion,  and  one  that  is  not  easy  to  prove  or  disprove,  that  in 
many  experimental  situations  it  is  easier  and  more  efficient  to  use  ranking 
methods  rather  than  scoring  methods.  Any  loss  in  efficiency  due  to  ranking, 
if  indeed  there  is  such  loss  of  efficiency,  may  be  offset  by  increased  ease 
and  speed  of  experimentation  which  permits  use  of  increased  sample  sizes  for 
the  same  time  of  experimentation.  As  we  see  it,  the  disadvantages  of  using 
ranking  methods  is  that  such  methods  are  not  fully  developed.  Experimental 
designs  that  permit  use  of  factorials  in  incomplete  blocks  are  not  directly 
available  unless  one  is  willing  to  use  analysis  of  variance  on  ranks  trans¬ 
formed  to  scores  through  use  of  Table  XX  of  Fisher  and  Yates  (194&). 
Similarly,  except  for  the  method  of  paired  comparisons  (which  is  widely 
applicable) ,  we  do  not  have  well  developed  ranking  methods  for  use  in  in¬ 
complete  blocks  unless  transformation  is  again  used.  We  shall  briefly  re¬ 
view,  but  not  discuss  in  detail,  the  method  of  paired  comparisons  intro¬ 
duced  by  Bradley  and  Terry  (1952)  and  Terry,  Bradley  and  Davis  (1952)  and 
the  method  of  concordance  for  ranking  in  balanced  incomplete  block  designs 
presented  by  Durbin  (1951). 


Consider  t  treatments  in  n  repetitions  of  the  possible  t(t-l)/2 
paired  treatment  comparisons.  The  basic  model  for  the  method  of  paired 
comparisons  assumes  the  existence  of  parameters,  tu  ,„..,u,  ,  n.  >0,  £n.  =  1 

such  that,  if  X^  is  an  observation  on  treatment  i  and  X.,  on  treatment  j, 
the  probability  that  X.  <  X  ,  treatment  i  receives  rank'll  and  treatment  j 
receives  rank  2,  treatment  2  is  preferred  to  treatment  j,  is 

(1.5)  P(Xi  <Xj}  =  *i/(  VV  • 

Methods  of  maximum  likelihood  are  used  to  obtain  estimators  p.  of  n. . 

These  estimators  are  obtained  by  solution  of  (t+l)  simultaneous  (bu£  not 
independent)  equations 


(2.5) 

(3.5) 


ai-  E 


n 


i  Pi*Pi 

ifr  3 


0,  x  1 , . . . , t 


SP±“  1 
i 


where  ai  =  2n(t-l)-Eri,  £r^  is  the  total  sum  of  ranks  for  treatment  i,  and 

a.  is  essentially  the  number  of  times  treatment  i  was  given  first  choice. 
Difficulties  in  application  stem  from  the  problem  of  solving  equations 
(2.5)  and  (3.5).  Iterative  methods  have  been  suggested  and  tables  of 
values  of  Er.  and  p.  are  given  in  the  first  reference  cited  on  this  subject 
and  by  Bradley  (1954).  Recently  Dykstra  (1956)  has  provided  easy  means  of 
obtaining  good  approximations  to  the  solutions  of  these  equations  and,  if 
hi3  approximations  are  used  as  first  estimates  of  the  solution,  at  most  one 
or  two  iterations  are  required  to  obtain  the  solution  with  desired  accuracy. 
When  the  estimators  are  obtained,  a  test  of  the  hypothesis  of  treatment 
equality, 

TX  ^  l  y/t  ,  l  l,»..,t, 


H  : 
o 
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against,  the  general  alternative, 

V*  n±  ^  1/t  for  some  i, 

is  based  on  the  statistic, 

(4.5)  B.  -  E  log(p,+p>)  -  Ea.  log  p. 

1  i<j  1  J  i  1  1 


and 

(5.5)  -2  In  \  «  (2.3026)  Qnt(t-l)  log  2  -  2B1]  , 

the  latter  of  which  has  approximately  the  ^-distribution  with  (t-l) 
degrees  of  freedom. 

The  method  of  paired  comparisons  has  been  further  developed.  The 
experiment  may  be  performed  in  groups  of  repetitions  (by  judges,  days,  etc.) 
and  a  test  of  group  by  treatment  interaction  is  possible.  A  test  for  the 
appropriateness  of  the  model  is  discussed  by  Bradley  (1954a)  and  tests  on 
the  model  using  extensive  experimental  data  were  made  by  Hopkins  (1954) • 

The  properties  of  the  method  and  power  comparisons  are  the  subject  of  a 
paper  by  Bradley  (1955)  and  Abelson  and  Bradley  (1954)  attempted  to  in¬ 
corporate  factorial  arrangements  of  treatments  into  paired  comparisons. 
Algebraic  difficulties  essentially  prohibit  the  use  of  factorials  in 
praotise.  Wilkinson  (1956)  in  a  thesis  considered  the  use  of  our  model 
for  paired  comparisons  in  certain  designs  of  Bos#  with  blocks  of  size  two. 


Durbin  generalised  the  method  of  concordance,  previously  available 
for  paired  comparisons  and  randomized  block  designs  [.Kendall  (1946) j. 
Durbin  supposed  that  n  objects  are  presented  in  blocks  of  size  k  with 
eaoh  object  ranked  m  times  in  the  experiment.  4  is  the  number  of  blocks 
in  which  anv  particular  pair  of  treatments  or  objects  ooour, 

\  ■  m(k-l)/(n-l) .  The  coefficient  of  concordance,  in  the  absenoe  of 
tied  ranks  is 


(6.5) 


u  12.  E  x \  -  3m  n(k+l)‘ 

W  J  4_1  j 


X2n(n2-1) 


where  x.  is  the  total  sum  of  ranks  for  the  y  object.  A  test  for  indepen¬ 
dence  among  the  m  rankings  of  an  object,  essentially  a  test  for  treatment 
effeots,  is  made  by  computing 


(7.5) 


A(n+1 

u+lJ 


-  W 


and  taking  this  statistic  to  have  approximately  the  F-distribution  of 
analysis  of  variance  with  and  degrees  of  freedom  where 
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mi 


(8.5) 


1  -  (k+1) 

-  ICT 


-  k 

1FTT. 


2(k+l) 

\(n+l) 


(9.5) 


and  \>2  may  not  be  integers  but  interpolation  in  F-tables  is  possible. 


A  numerical  example  is  given  by  Durbin  and  a  somewhat  large  example  is 
given  by  Bradley  (1955*0 • 


6.  DISCUSSION,  We  have  illustrated  some  of  our  recent  work  on  statistical 
methods  for  subjective  testing  and  summarised  and  referred  to  new  develop¬ 
ments  by  other  authors.  We  believe  that  our  discussions  indioate  the 
direction  of  research  and  thinking  on  problems  in  subjective  testing.  We 
have  made  one  notable  omission  at  least  in  referring  to  current  reaearoh  in 
this  area  and  now  note  the  work  of  Ferris  (1956),  Ferris  oomments  in  detail 
on  problems  in  subjective  testing  and  reviews  much  of  the  literature  of 
importance.  His  contributions  deal  with  the  construction  and  analysis  of 
statistical  designs  in  the  field  of  taste-teBting.  In  the  abstract  of  his 
thesis,  he  notes 


"Three  models  of  the  analysis  of  variance  are  put  forward 
as  appropriate,  illustrating  respectively 

(i)  how  classical  latin  square  and  incomplete  blo^k  designs 

may  bs  modified  to  incorporate  feature  (f)  above  f the,  phenomenon 
to  carry-over  or  residual  effects  ("after-taste")]]  ,  recommended 
especially  when  various  food-samples  are  being  tasted  serially 
for  flavor  ; 

(ii)  how  the  feature  (e)  j^the  psychological  phenomenon  of 
adaption]}  may  be  incorporated  in  judging  various  samples  of  food 
set  out  simultaneously,  for  color,  viscosity,  or  other  visually 
determinable  physical  characteristic; 

(iii)  how  one  may  find  a  suitable  design  even  when  physioal 
considerations  impose  severe  limitations  on  the  choice  of 
statistical  designs,  as  in  the  case  of  incomplete  block  designs 
of  two  limits". 


We  have  further  research  in  progress.  Still  is  considering  the 
correlation  between  the  Fisher  and  Yates'  soores  for  ranks  and  ranks 
and  between  the  scores  and  variate  values  for  various  populations. 
Stuart  (1954)  had  previously  considered  the  correlation  between  variate 
values  and  ranks.  Is  is  possible  that  this  study  may  yield  additional 
light  on  the  use  of  the  transform  of  ranks  to  scores. 


Pendergrass  has  considered  the  use  of  discrete  scoring  scales  and 
the  possible  loss  in  efficiency  in  using  scores  instead  of  actual 
observations  on  a  continuous  variable,  on  the  assumption  that  such 
observations  could  have  been  available.  He  is  also  working  on  the 
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extensions  of  the  model  for  paired  comparisons  to  ranking  in  triple  compari¬ 
sons  or  in  incomplete  blocks  of  size  three.  In  terms  of  parameters  and 
notation  similar  to  those  discussed  in  the  section  on  paired  comparisons, 
the  appropriate  model  for  triple  ranking  seems  to  yield  the  probability, 


ft 


m 


k 


p(*i  <  x4  <  v 


2 


*  Vk  *  Vi  *  *  Vi  *  Vj5, 


While  it  appears  that  the  mathematics  associated  with  this  model  may  be 
developed,  it  remains  to  be  seen  whether  or  not  application  will  be  simple 
enough  for  applied  use, 

7.  ACKNOWLEDGEMENTS ,  T.  S.  Russell  and  C.  Y.  Kramer  generously  contributed 
their  time  to  the  preparation  of  the  numerical  examples  in  sections  3  and  4 
of  this  paper  respectively.  We  greatly  appreciate  this  assistance.. 
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A i ’i ’LIGATIONS  OF  ORDER  STATISTICS  IN  MEDICAL  EXPERIMENTS 

B.  G.  Greenberg  and  A.  E.  Sarhan 
Department  of  Biostatistics 
University  of  North  Carolina 

1*  -  Introduction,  The  use  of  the  term  "order  statistics"  connotes  various 
meanings  and  is  here  defined  to  assure  understanding  in  its  present  usage. 

Order  statistics  is  that  body  of  knowledge  utilizing  the  rank  or  order 
of  an  observation  as  well  as  its  magnitude.  In  other  words,  it  is  a  combina¬ 
tion  of  the  techniques  used  in  conventional  statistics  (which  consider  the 
magnitude  of  the  observation)  with  those  of  rank  order  statistics  (which 
consider  only  the  relative  rank  of  the  observation) . 

A  detailed  bibliography  of  contributions  to  order  statistics  is  not 
presented  here  but  several  lists  may  be  found  in  Mosteller  /"13  ] y  David 
and  Johnson ,  and  a  recent  doctoral  dissertation  by  Lott/'ll/  . 

2.  Objective.  The  purpose  of  this  paper  is  to  illustrate  for  the  applied 
statistician  how  to  employ  recent  developments  in  order  statistics  for 
typical  statistical  problems. 

The  first  two  examples  will  be  selected  to  illustrate  how  order  statistics 
can  be  a  powerful  tool  when  observations  are  censored}  that  is,  the  exact 
value  of  some  observations  are  unknown  because  a  barrier  has  been  imposed  by 
the  observer  or  the  measuring  process. 

The  third  example  will  be  chosen  to  illustrate  how  the  use  of  order 
statistics  can  enable  the  experimenter  to  estimate  the  mean  and  standard 
deviation  of  a  distribution  with  high  efficiency  without  the  tedium  of  using 
all  observations  in  a  sampls. 

The  last  illustration  will  be  an  application  of  order  statistics  to  the 
problem  of  grouping  observations  into  a  frequency  distribution. 

The  application  of  these  techniques  will  arbitarily  be  restricted  to  the 
normal  and  single  exponential  distributions  since  probably  they  are  of  moot 
practical  value.  Order  statistics  have  been  applied,  nevertheless,  to  other 
distributions,  (e.g.  Sarhan /l8/  ,  ),  and  the  problem  of  truncation 

and  censoring  has  been  considered  in  other  distributions  such  as  the  chi- 
distribution  in  Cohen/3 /,  the  Type  III  in  Cohen /l /,  the  Poisson  in 
Raj /l6./ and  Cohen /*2y,  and  response-time  distributions  in  Sampford/^l]/  , 

3.  Censored  Observations.  The  word  censored  is  applied  to  instances  where 
sampling  is  from  an  unrestricted  population,  but  the  exact  magnitude  of 
specific  observations  in  the  sample  may  be  unknown.  The  number  of  censored 
observations  in  the  sample  is  known  and  their  ranking  relative  to  some  point 
of  censorship  is  also  available. 

Censored  is  different  from  the  tern  truncated  v.hich  is  applied  to 
instances  where  sampling  is  from  a  restricted  population  so  that  the  exact 
numbers  that  would  have  occurred  in  the  sample  above  and  below  the  trunca¬ 
tion  point  are  not  known.  Censored  was  first  used  in  this  context  by 
Hald/9/at  the  suggestion  of  Kerrich. 
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This  difference  between  censoring  and  truncated  is  best  emphasized  by  an 
illustration ■  If  one  were  to  measure  the  heights  of  American  military 
personnel,  the  sample  would  be  from  a  truncated  population  of  heights  because 
members  of  that  group  have  qualified  by  falling  between  certain  minimum  and 
maximum  allowable  heights.  In  an  industrial  context,  if  one  were  to  select 
samples  from  lots  that  had  undergone  quality  control  checks  to  assure  that 
the  manufactured  units  fell  within  specifications ,  the  sample  would  again  be 
from  a  truncated  population,  other  than  for  those  samples  accepted  but  which 
should  have  been  rejected. 

In  measuring  the  incubation  period  of  a  disease,  or  in  life-testing,  the 
experimenter  may  not  have  sufficient  time  or  facilities  to  await  the  develop¬ 
ment  of  the  phenomenon  in  all  cases,  and  might  censor  the  observations  on  the 
dtn  day  (Type  I)  or  after  p  per  cent  of  the  observations  had  responded  (Type  II). 
Censoring  is  usually  praoticed  at  the  extremes  of  the  distribution.  This 
illustration  of  censoring  observations  in  life-testing  situations  is  vdth  the 
exponential  distribution. 

Censoring  with  the  normal  distribution  might  be  for  the  same  reasons  or 
for  others  as  equally  important.  In  certain  industrial  applications,  (e.g. 
tensile  strength)  the  cost  of  measuring  an  observation  at  the  extreme  of  the 
distribution  is  relatively  higher  than  elsewhere.  That  is,  the  cost  of  an 
observation  might  be  functionally  related  to  its  distance  from  central 
tendency,  and  extreme  observations  are  uneconomical  to  justify.  Another 
reason  for  censoring  a  normal  distribution  might  be  termed  "precision  censor¬ 
ing,"  The  measurement  error  at  the  extremes  of  a  normally  distributed  variable 
might  be  considerably  greater  than  that  of  the  observations  toward  the  center 
by  having  a  flat-U -shaped  distribution.  This  occurs  in  some  situations  where 
measurement  of  physiological  functions  of  the  body  are  involved.  Counterparts 
for  preoision  censoring  in  industrial  and  other  applications  undoubtedly  are 
also  available. 

The  first  example  in  censoring  is  with  the  single  exponential  distribution, 
and  the  data  are  taken  from  Sarhan  and  Greenberg  /21j  ,  The  number  of  days 
incubation  period  resulting  from  an  inoculation  is  a  measure  of  the  amount  and 
potency  of  the  inoculum  as  well  as  the  individual  susceptibility  of  the  test 
animal.  Below  are  listed  the  unordered  responses  from  ten  rabbits  inoculated 
with  a  solution  containing  (0.2)  10-^  treponema  pallidum: 


Rabbit  No. 


Incubation  in  days 

<  18 
18 
>45 
<  18 
25 
21 
18 
25 
25 
21 


\  «  “  .  *  >.  «  M 

C 
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Estimates  of  the  earliest  possible  incubation  period  (a) ,  the  mean  (a  +  o) , 
and  the  standard  deviation  (a)  are  desired  from  the  two-parameter  single 
exponential  distribution  having  the  following  function: 


a  <  y  <  oo  , 


0  ,  otherwise 


Coefficients  from  tables  provided  in  the  same  paper  from  whioh  these 
data  oame  make  estimation  of  the  two  parameters  possible  despite  the  censoring 
of  three  observations.  The  observations  are  rearranged  in  else  order  and  the 
ooeffloients  written  alongside  as  follows : 


Ordered 

observations 

a 

0 

(a  +  0) 

<  18 

_ . 

,  m 

<  18 

wm 

18 

3007/2160 

-7/6 

487/2160 

18 

-121/2160 

l/6 

239/2160 

21 

-121/2160 

1/6 

239/2160 

21 
'  25 

-121/2160 

-121/2160 

1/6 

1/6 

239/2160 

239/2160 

25 

-121/2160 

1/6 

239/2160 

25 

-242/2160 

2/6 

478/2160 

>45 

- 

- 

Estimate 

.  16.10  .... 

5.67 

21.76 

Variance  (in 
terms  of  o2) 

_ 0.0567991 

0.1666667 

0.1114288 

Effioienoy  rela¬ 
tive  to  complete 
samole  19.56 

66.67 

89.74 

The  calculations  are  as  follows: 

a#  "  2lS0  t  (3°07)  ‘  121(18>  ’  -  2^2(25)7  -  16.10 

o*  -  — [  -  7  (18)  +  1(18)  +  1(21)  ♦  ...  +  2(25)7  *  5.67 

(a  ♦  o)#  -  C  W  (18)+239(18)  +  239(21)  ♦  ...  ♦  478(25)7  -  21.76 

The  variances  of  each  estimate  are  found  in  the  tables  of  the  same  paper 
and  are  also  shown  in  the  above  table  in  terms  of  o2.  Below  tha  variances  of 
each  estimate,  the  efficiency  relative  to  the  complete  uncensored  sample  is 
indicated.  For  example,  the  estimation  of  a  is  only  19.56  percent  efficient 
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because  two  observations  were  missing  on  the  left  and  one  on  the  right.  The 
most  valuable  observations  in  estimating  the  start  of  the  distribution, 
viz.  a,  should  be  expected  to  occur  at  the  left  hand  side  of  the  distribu¬ 
tion.  This  is  verified  by  the  fact  if  all  three  missing  observations  had 
occurred  on  the  right  hand  side  of  the  distribution  instead  of  two  on  the 
left  and  one  on  the  right,  the  efficiency  relative  to  the  complete  sample 
would  rise  from  19.56  to  95.24  percent, 


The  censoring  in  this  example  was  of  the  Type  I  variety,  i.e.  employing 
fixed  points  on  the  abscissa.  The  coefficients  used  to  estimate  the  para¬ 
meters,  however,  were  based  upon  the  assumption  of  Type  II  censoring.  This 
raises  an  important  question  whether  a  possible  bias  exists  and  how  muoh 
lower  the  precision  is  because  the  known  points  of  truncation,  via.  18  and 
45  days,  are  not  utilized  in  the  estimating  process. 

Several  authors,  e.g.  Sampford  [  17  have  expressed  the  opinion  that 
the  difference  between  the  two  is  of  no  great  import.  The  exact  solution 
of  the  loss  in  precision  is  a  difficult  problem,  and  work  is  in  progress  to 
measure  it.  In  the  interim,  we  have  conducted  a  sampling  study  to  investi¬ 
gate  whether  there  is  a  bias,  we  well  as  the  magnitude  of  the  imprecision. 

As  a  result  of  this  investigation,  we  feel  that  there  is  no  bias  and  the 
order  of  magnitude  of  the  imprecision  is  small,  particularly  in  large  samples. 

The  sampling  study  consisted  of  Ifi  samples  of  size  10,  selected  from 
Rand's  Table  of  100,000  Normal  Deviates  (m  ■  0,  o  *  l) ,  and  estimates  of  the 
mean  and  standard  deviation  by  both  Typo  I  and  Type  II  censoring  were  com¬ 
pared  in  each  sample.  For  instance,  below  is  a  segment,  chosen  at  random, 
from  the  sampling  study, 

Sample  Censored#  at  -  0.253 

No,  36  and  ordered 


0.088 

-0.331 

-1.729 

1.209 


(-1.729) 

(-1.075) 

(-0.467) 

(-0.331) 


0.840 

-1.075 

0.894 

-0.118 


-0.118 

0.082 

0.088 

0.840 


0.082  0.894 

-0.467  1.209 

#  The  (  )  indicates  that  the  value  was  censored. 

A  comparison  of  the  estimates  of  the  mean  and  standard  deviation  calcu¬ 
lated  from  each  sample  included  the  following:  The  uncensored  data;  maximum 
likelihood  method  of  Uohen  [  k  ]  for  Type  I  censoring;  the  method  of  Ipsen  /! lo/ 
for  Type  I  censoring;  and  the  best  linear  estimate  (BLE)  with  minimum  variance 
for  Type  II  censoring. 

The  tabulation  below  gives  some  idea  of  the  comparisons  for  the  given 
sample  No.  36,  thus  indicating  why  it  in  thought  that  there  is  no  bias. 
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m 


Method  of  estimation 

Mean 

Standard  deviat 

Population 

0 

1.0000 

Unosnsored  sample 

-0.0607 

0.9520 

Censored 

i 

i 

Cohen  ('type  I) 

-0.0283 

0.8042 

Xpaen  (Type  XI) 

-0.0362 

0,8375 

B.L.fc.  (type  XX) 

-0.0099  - 

0.8527 

Calculation  of  the  OLE  of  the  mean  end  standard  deviation  employed  in 
the  foregoing  sample  for  the  normal  distribution  can  best  be  demonstrated 
by  application  to  another  example.  The  data  are  taken  froin  Sarhan  and 
Greenberg  [ 20  j  and  represent  Type  I  censoring  at  both  .sxtremes  of  the 
sample. 

The  observations  consist  of  ten  individual  systolic  blood  pressures 
which  were  performed  by  persons  learning  io  measure  such  readings.  Owing 
to  the  relatively  larger  measurement  error  known  to  exist  for  beginners 
at  the  extremes,  the  data  thought  to  ba  less  than  105  pn.  and  greater  than 
125  mm.  were  censored.  This  resulted  in  censoring  two  observations  on  the 
left  and  three  on  the  right. 

The  data  have  been  arrangod  in  size  order,  and  alongside  of  them  are 
the  coefficients  to  estimate  the  mean  and  standard  deviation  as  follows: 


Ordered 

observations 

0 

1.  - 

0 

0 

2.  - 

0 

0 

3.  108 

.20496319 

-.88982266 

4.  Ill 

.10382533 

-.11005067 

5.  119 

,11220127 

-.02620385 

6-  121 

.11982080 

.05494874 

7.  125 

.45918942 

.9711284 2 

8.  - 

0 

0 

9.  - 

0 

0 

10...  -  ... 

0 

0 

Estimates 

118.9 

16,61 . .  . 

Variance  (in 

...  terms  o_f  c2)  _ . 

.11795477 

.17132071 

Efficiency  rela¬ 
tive  to  complete 

sample 

-  -34,78 _ 

_ 2 _ 

From  the  percentage  efficiency  given  in  the  table,  the  estimate  of  the 
mean  was  84.78  percent  relative  to  the  complete  sample  despite  the  omission 
of  50  percent  of  the  observations.  The  estimate  of  the  standard  deviation 
does  not  fare  as  well,  for  its  efficiency  drops  to  33.62  percent. 
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4,  Simplified  Statistics,  In  the  foregoing  paragraph,  mention  was  made  of 
the  fact  that  the  estimate  of  the  mean  from  the  sample  was  relatively  efficient 
although  only  50  percent  of  the  sample  was  used.  The  optimum  combination  of 
half  of  the  sample  elements ,  if  tha  estimation  of  the  mean  were  to  be  maximally 
efficient,  might  not  be  the  five  obeervations  actually  used.  In  fact,  owing 
to  the  impetus  given  by  Mosteller  /* 13  ] %  a  whole  branch  of  linear  systematic 
statistics  has  recently  developed  in  which  the  purpose  is  to  make  efficient 
estimates  of  the  mean  and  standard  deviation  using  2 ,  3 ,  4,  . . . ,  k  <  n  sample 
elements. 

The  most  readily  identified  measure  of  linear  systematic  statistics  is, 
of  course,  the  median  as  an  estimate  of  looation.  An  estimate  of  dispersion 
might  be  the  range,  semi-interquartile  distance,  and  others.  We  shall  explore 
these  now  in  a  little  more  detail  using  data  from  the  exponential  distribution 
as  an  illustration. 

The  data  corns  from  Table  I  of  Maguire,  Pearson  and  Wynn  [  12 ]%  and 
represent  the  time  intervals  in  days  between  explosions  in  mines  involving 
more  than  10  men  killed  from  December  6,  1875  to  May  29,  1951*  The  109 
observations  have  been  rearranged  in  size  order  as  follows*. 

Table  1.  Time  interval  in  days  between  explosions  in  mines 
involving  more  than  10  men  killed _ 


Order  Observation  Order 


Order 


Order  Observation 


1 

1 

31 

59 

2 

4 

32 

59 

3 

4 

33 

61 

4 

7 

34 

61 

5 

11 

35 

66 

6 

13 

36 

72 

7 

15 

37 

72 

8 

15 

38 

75 

9 

17 

39 

78 

10 

18 

40 

78 

11 

19 

a 

81 

12 

19 

42 

93 

13 

20 

43 

96 

14 

20 

44 

99 

15 

22 

45 

108 

16 

23 

46 

113 

17 

28 

47 

114 

18 

29 

48 

120 

19 

31 

49 

120 

20 

32 

50 

123 

21 

36 

51 

124 

22 

37 

52 

129 

23 

47 

53 

131 

24 

48 

54 

137 

25 

49 

55 

145 

26 

50 

56 

151 

27 

54 

57 

156 

28 

54 

58 

171 

29 

55 

59 

176 

- .  >•  • ,  • 

rt-m. 

■  v>  y-v*  yy. >y*  yy  v«v 
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The  single  one-parameter  exponential  distribution  represented  by 
l  2L_ 

f(x)  -  7*/  “  o  x  >  0 
0  ,  otherwise 

has  been  shown  to  fit  thess  data  quite  nicely.  If  the  two-parameter  ex¬ 
ponential  distribution  must  be  used,  a  simple  transformation  of  looation 
can  be  used. 

The  estimate  of  the  standard  deviation  a ,  using  all  observations,  ie 
equal  to  241  days.  To  estimate  the  value  of  a  using  k  <  n,  tables  for 
k  -  1,  2,  ...,15  are  available  for  exponential  distribution  in  a  recent 
paper  by  Ogawa  [  15  /.  For  example,  if  k  ■  5  were  seleoted,  the  relative 
effioienoy  to  the  oomplete  sample  estimate  would  be  94.76  percent.  From 
Ogawa' s  table  of  optimum  spaclngs  for  k  -  5,  one  also  learns  that  the  five 
sample  observations  which  are  best  to  use  are  as  follows: 

kx  -  (109) (.39347)  ♦  1  ■  43 

k2  -  (109) (.67044)  +  1  -  74 

k3  -  (109)  (  .84433)  +  1  -  93 

k4  -  (109) (.94387)  ♦  1  -103 

k5  -  (109) (.98855)  +  1  -108 

The  ooeffioients  in  the  above  forumation,  viz.  .39347,  .67044,  ••*, 
.98855  were  obtained  from  Ogawa' s  Table  II  and  the  calculations  are  rounded 
to  the  lowest  whole  integer.  Using  that  same  table  for  the  weighting  co¬ 
efficients,  the  calculations  for  <x*  are  as  follows: 


Observation 

Number 

Observation 

. \ . 

Coefficient 

(a^ 

43 

96 

.33051 

74 

271 

.21896 

93 

364 

.13173 

103 

745 

.06668 

108 

_ A&2 _ 

.02286 

5 

Then*  °  "  0.9476  i-1  ^Xn^ 

-  (225.56662)  -  238.0 

This  compares  favorably  with  the  value  of  241  calculated  from  the  complete 
sample.  If  k  ■  10  were  chosen,  the  efficiency  would  have  risen  to  98.32  per¬ 
cent  and  the  value  of  242.6. 
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If  the  parameters  of  the  normal  distribution  are  to  be  estimated  by 
simplified  statistics,  an  earlier  paper  by  Ogawa  f 14 J  provides  the  optimum 
spacings  for  that  distribution,  Although  Ogawa's  spacings  are  optimal,  other 
combinations  of  sample  observations  may  be  much  more  convenient  to  use  with¬ 
out  any  great  sacrifice  in  precision.  Such  systematic  statistics  can  be 
found  in  Mosteller  [  13  J7,  Dixon  [l J  t  and  Lott  f  11  _/. 

5.  Grouping.  The  optimum  spacings  used  in  the  previous  section  for  the  (best 
linear  estimate  of  the  parameters  have  been  shown  recently  to  have  another 
most  interesting  property  in  application  to  a  normal  distribution.  Suppose 
there  are  available  observations  on  a  continuous  variable,  and  it  is  desired 
to  classify  them,  or  the  population  from  which  they  wore  drawn,  into  k  groups. 
This  might  be  done  either  for, purposes  of  convenience  in  exposition,  calcula¬ 
tion,  or  for  simplification  in  the  collection  of  further  observations,  Thus, 
if  heights  of  individuals  are  bo  be  classified  as  tall,  medium  and, short,,  the 
problem  is  to  locate  the  dividing  lines  to  be  drawn  without,  restricting  our-  ' 
selves'  either  to  equally-spaced  groups  or  groups  with  equal  expectation,  of 
frequency  of  observations.  The  criterion  for  grouping  is  that  if .the,. observa¬ 
tions  in  a  group  are  to  be  represented  by  a  group  oentral  value,  the  loss  in 
efficiency  by  this  procedure  should  be  a  minimum.  Furthermore,  if  this  classi¬ 
fication  is  carried  out,  the  loss  of  efficiency  in  k  ■  2,  3,  4,  groups  is 
also  of  interest. 


This  same  problem  ooouro  when  there  are  measurements  on.  two  continuous 
variables  for  a  given  sample.  The  experimenter,  instead  of  testing  the 
correlation  betweon  the  two  variables,  may  prefer  for  reasons  of  exposition 
to  group  the  x  variable  into  k  classifications  so  as  to  maximize  the  test  of 
the  differences  in  the  y  variable  among  groups  by>  an : analysis  of  variance. 
Regardless  of  the  correlation  betwoen  the  two  variables,  D.  R.  Cox  /*  5  /  has 
reoently  shown  that  the  solution  for  griping  the  x  variable  oomes  down , feo  the 
problem  of  optimum  epapinga  if  the  distribution  is  normal.  Thus,  if  sample 
data  were  available  on  heights  and  weights,  classification  pf  the  individuals 
into  tall,  medium,  and  short  oould  be  aooompliehed  optimally  by  dividing  the 
range  of  heights  such  that  the  tall  and  short  groups  each  had  27. 027  percent 
of  the  observations  and  the  medium  group  had  the  remaining  45.946  percent. 

This  means  that  the  limits  of  the  three  intervals  would  be  as  follows  in  a 
unit  normal  distribution; 

Short;  ■  d)  to  -.0,612 

Medium:  •*  0,612  to  +  0,612 

Tall;  +  0.612  to  +  oO 

The  lose  of  precision  of  this  arrangement  by  substituting  one  group 
value  for  the  individual  observation  results  in  an  efficiency  of  80.98 
peroent.  In  fact,  the  optimum  groups  raid  effioienoies  can  be  obtained  for 

k  ■  2,  3 . 6  in  Cox's  paper  and  for  k  ■  2,  ...  11  in  Ogawa's  paper. 

The  former's  values  are  more  nearly  precise  in  the  last  decimal  place  than 
those  of  Ogawa,  and .certain  results  of  Cox  are  reproduced  in  Table  2  hero 
for  convenience. 
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Table  2.  Optimum  grouping  intervals  for  unit  normal  distribution  with 
percentage  distribution  of  observations  and  efficiency 


k 

Group  limits 

Percentage  dis¬ 
tribution 

Percentage  effioienoy 
relative  to  exaot  ob¬ 
servation 

2 

dO  to  0 

.500 

63 .66 

— 0 - TS-*  “30 - 

.  - . *500 - 

-  o0  to  -  0.612 

.270 

3 

-  0.612  to  +  0.612 

•  459 

L  80.98 

*  0.?12  to  +  <C  . .  . 

.270 

-  00- . to  -  0.980 

.164 

.33? 

l  88.25 

4 

0  to  +  0.980  ; 

.33? 

♦  0.980  to  oO 

.lo4 

-  qO  .  to  -  1.230 . . 

.109 

-  1.230  to  -  3.395 

.237 

5 

-  0.395  to  ♦  0.395 

.307 

»  92.01 

+  0.395  to  t  1-430 

.237 

+  1.230  to  +  cO 

.169 

-dO  to  -1.449 

.074 

.181 

6 

-  0.660  to  0 .  ... ... 

.245 

L  94.20' 

olo 

.245  . . 

.lK 

,  ♦.  1.W9  $ _ _ 

-  - .  *574 _ 

The  information  in  Table  2  indicates  that  the  substitution  of  a  binomial 
classification  (k  ■  2)  for  a  normal  variate  results  in  an  efficiency  of  63.66 
percent.  This  particular  figure  of  efficiency  may  be  helpful  in  estimating 
required  sample  sizes  in  some  experiments  where  there  is  no  experience  with  a 
variable  to  bj  measured  but  there  is  some  information  on  a  binomial  plane. 

There  are  two  points  worth  mentioning  about  the  use  of  reeults  in  this 
section.  The  first  is  that  solution  for  optimum  groupings  is  identical  to 


optimum  epaoings  for  the  estimation  problem  in  the  case  of  the  normal  dis¬ 
tribution,  This  does  not  appear  to  be  true  in  general,  however,  and  the 
rectangular  distribution  ie  a  case  in  point. 

Secondly,  after  grouping  has  bean  performed,  whether  by  the  methods  out¬ 
lined  here  or  not,  the  estimation  of  the  mean  and  standard  deviation  will  be 
made  by  referring  the  observations  in  a  group  to  some  oentral  value  of  that 
group.  Ordinarily,  Sheppard's  corrections  are  applied  during  this  step  if 
the  groups  are  equidistant.  These  corrections  may  lead  to  inconsistent 
estimates  for  both  the  mean  and  standard  deviation  even  when  the  grouping  is 
equidistant.  Consistent  estimates  for  both  the  mean  and  standard  deviation 
can  be  made  by  maximum  likelihood  according  to  the  method  outlined  by 
Gjeddeback  [  8  J. 


is®’”-'1 
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6.  Summary.  The  uses  of  recent  contributions  to  the  techniques  found  in 
order  statistics  have  been  applied  in  three  instances.  The  first  is  in  the 
case  of  censored  observations  both  with  the  normal  and  exponential  distribu¬ 
tions,  The  second  application  involves  estimation  of  the  parameters  of  the 
same  distributions  using  k  <  n  of  the  sample  elements.  The  final  illustra¬ 
tion  is  of  an  application  of  grouping  continuous  data  into  frequency 
classifications. 
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PROBLEMS  IN  A  PARTICULAR  MILITARY  FIELD  EXPERIMENT 


Kenneth  L.  Yudowitch 
Operations  Research  Office 

I  should  like  to  commence  my  remarks  with  the  enunciation  of  the  three 
principles  which  I  propose  for  guidance  in  the  design  of  military  field  ex¬ 
periments,  the  subject  of  this  conference.  The  three  principles  are:  (l)  the 
exploitation  of  ignorance;  (2)  the  agglomeration  of  imponderables;  and  (3)  the 
balance  of  weights.  In  case  these  designations  are  not  patently  clear,  I 
shall  attempt  to  illustrate  each  of  the  three  principles. 

I. 

The  first  principle  (Fig.  l)*  is  rooted  in  the  general  technical  ignor¬ 
ance  of  our  customer,  the  soldier,  who  might  well  quote  from  Sheridan's 
Rivals:  "Egad  I  think  the  interpreter  is  the  hardest  to  be  understood  of 
the  twoi"  It  is  perhaps  a  horrid  admission  which  should  be  classified,  but 
probably  not  one  Lt.  Col.  per  Pentagon  ring  can  define  a  Graeco-Latin  Square. 
To  illustrate  the  point,  consider  a  simple  statistical  test  which  one  might 
apply  to  a  soldier.  We  offer  him  a  bet  on  the  drawing  of  straws ,  demonstrat¬ 
ing  first  a  sample  drawing  of  ten  straws  from  a  population  of  many  hundreds. 
The  soldier  is  offered  a  fifty-fifty  bet  on  the  selection  of  a  straw  of  his 
choice  —  either  long  or  short.  Let  us  suppose  the  demonstration  drawing 
yields  six  long  straws  and  four  short  straws.  As  any  of  us  here  could  tell 
the  soldier,  all  that  he  can  reliably  say  about  the  probability  of  picking 
a  long  straw  is  that  it  is  significantly  greater  than  24  percent.  This  is 
the  customary  acceptable  lower  95  percent  confidence  limit  on  the  probability 
of  picking  a  long  straw.  And  yet  our  investigations  show  that  the  soldier 
will  accept  the  bet  and  select  the  long  straw.  What  is  further  more  dis¬ 
couraging  is  that  the  soldier  will  take  our  money  on  such  bets. 

It  is  clear  that  the  customer  frequently  ignores  such  refinements  as 
95  percent  confidence  limits.  How  to  deal  with  such  a  barbarian?  —  Search 
out  what  his  question  really  is  before  phrasing  the  objective  of  an  investiga¬ 
tion.  Then  design  a  test  to  answer  only  those  objectives  in  the  same  language 
in  which  the  question  was  asked.  If  he  is  disinterested  in  the  beauties  of 
symmetrically  oriented  test  designs,  let  us  exploit  this  ignorance  (though 
it  pain  our  sensitivity)  and  make  the  crude  minimum  design  required.  In¬ 
sistence  upon  revision  of  the  customer's  question  to  an  extent  which 
eventually  restricts  our  ability  to  answer  his  real-world  question  is 
delusory  of  self-indulgence. 


II. 

The  principle  of  the  agglomeration  of  imponderables  (Fig.  2)  is  perhaps 
best  clarified  with  an  illustration  from  immediate  experience:  Some  time 
ago  I  was  asked  to  design  an  experiment  which  entailed  180,224  sets  of  con¬ 
ditions  of  measurement.  The  experiment  requested  had  the  objective  of 
measuring  the  relative  hit  probabilities  of  eight  types  of  ammunition.  Also 
indicated  was  some  interest  in  the  particular  effects  of  various  related 
parameters,  such  as  qualifications  of  the  subjects,  positions  of  firing, 
conditions  of  illumination,  and  a  mass  of  variables  associated  with  the  targets. 


*  Figures  apnear  at  end  of  the  articles. 
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These  target  parameters  are  listed  here  (Fig.  3)«  Although  obviously 
in  true  context  each  of  these  parameters  exists  in  a  continuum,  the  range 
of  variations  must  here  be  represented  by  a  restricted  few  values,  so  we 
begin  agglomeration  of  these  imponderables  by  arbitrarily  selecting  the 
numbers  of  values  to  be  used  for  each  parameter.  As  indicated,  the  first 
four  are  represented  by  only  two  values  each.  The  justification  for  this 
limitation  is  that  careful  selection  of  two  rather  extreme  values  will 
permit  recognition  of  the  existence  and  general  magnitude  of  effect  of 
variation  of  any  one  of  the  parameters ,  and  probably  permit  rough  interpola¬ 
tion. 


An  immediate  agglomeration  is  provided  by  the  context  for  the  last  two 
of  the  half-dozen  parameters ,  when  there  is  a  marked  interdependence  in 
these  two,  parameters.  As  both  characteristics  ar$  presented  in  any  one 
target,  the  number  of  the  combinations  of  the  two  parameters  is  limited  to 
the  number  of  targets.-  A  systematic  attempt  to  represent,  all  of  .the  09m- 
binations  of  values  indicated  for  each  of  the  six  (now  five)  parameters, 
results  in  352  combinations  of  parameters ,  which  in  this  experiment  would 
mean  352  different  targets.  It  is  clear,  however,  that  the  first  four 
parameters ,  as  wsll  as  the  last  two  are  also  ultimately  represented  in  eaoh 
of  the  individual  targets  of  the  system.  Our  preliminary  investigation  also 
revealed  considerable  interdependence  among,  all  of  these  parameters.  It  is 
then  perhaps  the  ultimate  eppli cation  of  the  principle  of.  agglomeration  of 
imponderables  to  dump  the  variations  of  all  six  parameters  into  one  presenta¬ 
tion  of  the  target  system.  As  the  facts  of  life  limited  us  to  22  targets, 
the  application  of  the  seoond  principle  results  in  a  reduction  of  from  six 
times  infinity  to  352  to  22  representations  of  these  half-dozen  parameters. 
Finally  then,  all  22  targets  appear  in  a  single  sequence  which  we  call  a  run. 

In  addition  to  the  target  characteristics,  we  are  concerned  also  with 
oharaoteristics  of  the  subjects,  the  environment  and  the  test  materiel. 
Grouped  here  are  these  several  parameters  (Fig.  4).  The  subjects  oome  to 
ue  in  four  formal  qualifications.  From  the  variety  of  firing  positions  as 
above,  we  selected  twoj  similarly  for  the  illumination.  These  three  para¬ 
meters  then  yield  a  product  of  16  combinations.  Qualification  varietise 
were  dimply  deleted  from  the  experiment  nroper,  and  four  epooial  runs  pro¬ 
grammed  for  measurement  of  variation  in  this  parameter. 

The  handling  of  the  four  possible  combinations  of  position  and  illumi¬ 
nation  is  a  nice  illustration  of  a  corollary  of  the  second  principle.  The 
little  blook  diagram  (Fig.  5)  represents  the  four  possible  combinations 
(day  and  night,  sitting  and  standing).  If  there  is  a  degree  of  independence 
between  variations  in  position  and  variations  in  illumination,  it  is  quite 
possible  to  infer  the  value  for  a  fourth  box  of  this  square  array,  given 
three.  In  this  instance  we  elected  to  make  measurements  for  both  positions 
in  the  daytime  and  for  the  Bitting  position  at  night.  We  thus  obtained  two 
measures  of  soore  degradation,  one  for  the  shift  of  position  from  sitting 
to  standing,  and  the  second  for  the  shift  of  illumination  from  day  to  night. 
Presumably  the  score  for  the  unmeasured  category  (night  standing)  is  deduced 
by  applying  both  degradation  factors  multiplicatively  to  the  day-sitting 
score.  Thus  we  reduce  16  combinations  of  these  three  parameters  to  only 
three. 
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We  come  to  the  last  two  factors,  the  ammunition  itself  and  variation  in 
subjects  (Fig.  6).  Our  experiment  has  specified  eight  ammunitione.  The 
number  of  samples  of  subjects  required  depends  on  the  anticipated  variation 
from  sample  to  sample.  As  a  compromise  with  practicality,  four  samples 
were  agreed  upon.  This  number  seems  sufficient  to  give  a  fairly  reliable 
indication  of  the  degree  of  variation  among  samples:  The  average  is  meaning¬ 
ful  if  the  variations  are  small;  and  the  opportunity  for  variation  is  adequate 
to  indicate  whether  a  larger  number  of  samples  was  required.  There  is  no 
simple  means  of  agglomeration  here,  so  that  our  ammunition  and  population 
parameters  leave  us  with  32  separate  experimental  conditions. 

Finally,  however,  a  further  agglomeration  is  made  by  limiting  the 
number  of  combinations  of  each  of  the  four  samples  with  eaoh  of  the  three 
position-illumination  categories.  In  this  case,  instead  of  the  twelve 
possible  oombinatione ,  only  eight  of  the  combinations  are  selected  so  that 
our  ultimate  number  of  runs  is  3x32x8/12  or  64.  In  addition,  the  special 
qualification  runs  consist  of  two  ammunitions  and  two  population  samples-, 
making  a  grand  total  of  68  runs  (Fig.  7).  We  have  reduced  the  number  of 
experimental  conditions  from  a  grand  total  of  16x16x32  or  8192  runs  by  a 
faotor  of  120,  by  application  of  the  second  principle. 

As  a  very  heavy  schedule  permitted  a  maximum  of  8  runs  per  day,  the 
schedule  of  68  runs  (1496  conditions)  required  8  l/2  days  in  the  field, 
following  preparational  field  work.  It  is  of  interest  to  note  that  if 
this  same  experiment  were  attempted  without  application  of  the  second 
principle,  whatever  conclusions  might  have  been  reached  concerning  the 
test  materiel  would  be  totally  obsolete;  as  the  8192  runs  (180,224  condi¬ 
tions)  would  take  four  years  of  steady  work  to  complete.  —  This  is  based 
on  a  five-day  week  with  Christmas,  Independence  Day,  and  Armed  Forces  Day 
off. 


III. 


Inherent  in  the  illustration  used  for  the  second  principle  is  the 
application  of  the  third  principle  of  experimental  design,  the  balance  of 
weights  (Fig.  8),  Quite  obviously  among  the  180,  224  combinations  of  para¬ 
meters  possible  in  the  illustrative  experiment,  there  are  some  combinations 
rather  more  important  than  others.  It  is  essential  that  we  consider  not 
only  the  nicety  of  design  for  simplified  analysis  procedures,  but  that  we 
consider  why  we  are  doing  the  experiment  in  the  first  place,  The  customer 
(who  is  the  quite  ignorant  fellow  we  spoke  of  earlier)  is  unconcerned  with 
statistical  niceties.  He  is  however  very  much  ‘concerned  in  finding  certain 
anawers  whioh  are  vital  for  hie  decisions,  and  somewhat  less  interested  in 
finding  certain  other  answers  whioh  will  be  of  incidental  utility  in  guiding 
his  decisions  or  activities.  Thus,  for  example,  it  may  be  that  the  customer 
is  very  vitally  concerned  with  comparative  capabilities  of  two  of  the  eight 
types  of  ammunition  under  study,  and  somewhat  less  concerned  with  compari¬ 
sons  involving  the  other  six  types.  It  is  incumbent  upon  the  experiment 
decdgner  to  recognize  this  difference  in  interest,  and  bo  respond  to  it 
with  appropriate  distribution  for  weighting  of  experimental  effort.'  Any 
refusal  to  consider  baltincd1  of  weighting  6f  experimental  effort  la  patently 
Justified,,  as  justification  for  the  nerformance  of  the  entire  experiment 
swings ■  only  from  the  interest  of  the  customer.  .  Any  weighting  of  dub- 
categories  of  interest  must  in  any  honest  experiment  be  reflected  in  the 
experimental  effort. 
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A  second,  more  technical  factor  also  affects  appropriate  weights  of 
portions  of  an  experiment.  For  example,  note  the  6/3.2 * s  of  the  possible 
combinations  of  population  sample  with  position  illumination  which  were 
selected.  The  diagram  (Fig.  9)  illustrates  the  eight  out  of  twelve  possible 
combinations  selected.  It  is  oloar  that  emphasis  has  arbitrarily  been 
placed  on  the  day-sitting  runs;  half  each  of  the  day-standing  and  night¬ 
sitting  runs  having  been  deleted.  The  logical  attempt  to  justify  such  an 
asymmetrical  procedure  is  as  follows:  In  the  first  place  the  reduction 
frdm  twelve  is  strongly  urged  by  practical  limitations  on  the  total  experi¬ 
mental  effort.  One  might,  however,  expect  a  more  symmetrical  or  uniform 
mode  of  reduction.  But  a  uniform  reduction  of  the  experiment  threatens 
that  the  resultant  measurements  may  border  on  statistical  unreliability, 
merely  because  of  the  small  sample  size.  The  selection  made  obviously 
permits  all  four  population  samples  to  be  used  with  one  of  the  conditions 
of  firing,  providing  a  reliable  measure  for  that  condition.  The  other  two 
conditions  (day-standing  and  night-sitting)  are  less  reliably  measured. 

This  is  justified,  because  it  is  much  more  important  to  debermine>'Whether 
the  ammunition  differences  sought  exist  under  any  condition  than  it  is  to 
determine  the  variations  of  this  difference  with  the  several  conditions  of 
firing. 


I  should  like  to  close  my  remarks  with  a  carefully  considered  state¬ 
ment:  "My  immediate  point  is  that  the  questions  involved  can  be  dis¬ 
associated  from  all  that  is  technical  in  the  statistician's  craft,  and 
that  when  so  detached  are  questions  only  of  the  right  use  of  human  reason¬ 
ing  powers,  with  which  all  intelligent  people,  who  hope  to  be  intelligible, 
are  equally  concerned,  and  on  which  the  statistician  as  such,  speaks  with 
no  special  authority." 
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HUMAN  ENGINEERING  EXPERIMENT  ON  TUBE  TESTER  W-2/U 


Harold  Zweigbaum  and  Donald  Donaldson 
Signal  Corps  Engineering  Laboratories 

The  extensive  use  of  electron  tubes  in  military-  equipments  has  led  to 
a  widespread  acceptance  of  the  conventional  tube  tester  as  a  maintenance 
tool.  In  the  early  days  of  eieotronics,  when  the  multi-element  eleotron 
tube  was  coming  into  general  use,  the  adequacy-  of  the  tube  tester  to  deter¬ 
mine  the  quality  of  the  tube  was  quite  satisfactory.  Operating  frequencies 
were  relatively  low,  and  emission  or  transconductance  tests  were  made  under 
conditions  that  rather  closely  approximated  the  actual  operating  conditions. 

Now,  however,  not  only  have  the  number  of  tube  types  inoreaaed  by  . 
several  orders  of  magnitude  but  the  applications  have  become  more  complex. 
Operating  conditions  vary  widely.  The  use  of  a  conventional  tester  whioh 
allows  but  ono  *alue  of  plate  voltage  and  two  values  of  screen  voltage  to 
be  applied  to  the  respective  elements  has  not  proven  to  be  of  value  in 
military  depots.  There  the  need  has  been  for  a  tube  teeter  that  will  allow 
the  application  of  variable  voltages  to  the  tube  elements.;  so  that  a  reading 
of  quality  can  be  compared  with  the  manufacturer5*  original  acceptance  date.. 

This  requirement  was  staisfied  by  the  development  of  the  Electron  Tube 
Test  Set  7V-2/U,-  featuring  continuously  variable  and  nietered  voltages  to  the 
several  elements  of  the  tube  under  test.  While  the  flexibility  thus  attained 
permits  tube  testing  within  the  requirements  of  MIL-E-1,  this  desirable 
operational  feature  has  created  human  engineering  problems. 

After  the  tube  tester  was  built  and  during  its  preliminary  use  at  the 
Signal  Corps  Engineering  Lab,  it  beeame  apparent  that,  even  though  the 
front  panel  was  laid  out  in  a  logical  sequence,  the  number  of  manipulations 
required  to  perform  a  test  on  a  tube  was  conducive  to  error. 

In  order  to  ascertain  whether  or  not  the  operation  of  the  tube  tester 
placed  undue  reliability  on  dperator  capability,  a  statistical  experiment 
was  designed,  using  the  tube  tfester  and  actual  operators.  (See  the  picture 
of  the  front  panel  layout  of  the  TV-2/U  at  the  end  of  this  paper.) 

As  we  can  readily  see,  the  versatility  of  the  TV-2/U  was  obtained  at 
the  expense  of  an  increased  number  of  controls,  switches,  and  monitoring 
meters.  In  the  more  conventional  tube  tester  used  for  simple  maintenance 
applications,  about  twenty  separate  and,  distinct  manual  operations  are 
required  in  order  to  check  the  condition  of  an  average  receiving  tubes 
this  same  type  of  test  in  the  TV-2/U  requires  that  an  operator  go  through 
about  thirty-four  separate  steps,  or  about  a  7<?%  increase.  In  order  that 
we  might  see  what  effect.  If  any,  in  the  performance  of  an  operator  was 
due  to  the  increased  number  of  manual  operations,  an  experiment  was 
devised  to  provide  aata  pertinent  to  the  precision  of  measurement  obtained 
by  a  normal  clans  of  operators. 

In  planning  the  experiment,  certain  controlling  conditions  were  evident. 
It  was  necessary  to  select  individuals  whose  performance  could  be  considered 
representative  of  that  group.  Further-,  since  Individual  variation  among 
operators  is  to  be  expected,  more  than  one  individual  was  required  in  order 
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to  permit  a  measure  of  the  sampling  variation.  It  was  also  necessary  that 
the  equipment  under  consideration  be  tested  across  the  range  of  its  intended 
operation  in  order  to  eliminate  from  the  results  any  spurious  homogeneity 
occasioned  by  too  narrow  a  range  of  study.  The  standard  for  oomparison 
that  was  chosen  was  the  set  of  measurements  obtained  by  laboratory  engineers 
who  were  familiar  with  the  equipment  and  its  operation. 

The  epeoifio  test  schedules  involved  the  ohoioe  of  twenty-five  (?£) 
eleotron  tubes,  five  (5)  from  eaoh  of  five  (f>)  generic  groups,  pentodes, 
triOdes,  voltage  regulators,  diodes  and  reotifiers,  thus  cohering  most  of 
the  operating  range  for  which  the  TV-2/U  is  designed.  Thyratrono  were 
exoluded  from  the  schedules.  This  group  of  tubes  is  one  of  the  post  diffi¬ 
cult  to  test,  and  it  was  deeided  to  utilise  statistical  data  from  other  tube 
types  in  estimating  operator  precision.  The  premise  of  this  deoision  was 
that  data  from  the  thyratron  tube  type  would  be  unnecessary  should  the  results 
from  other  types  prove  oonolusive. 

The  test  procedure  was  sstabli shed  in  three  phases*  ,  First,  the  selected 
tubes  were  measured  by  laboratory  engineers  (one  of  the  two  classes  of  opera¬ 
tors)  at  Evans  Signal  Laboratory  on  two  test  sets,  and  the  data  reoorded. 
Seoond,  the  test  sets  were  transported  to  the  Tobyhanna  Signal  Depot  In 
Pennsylvania.  After  the  initial  training  of  depot  personnel  (the  seoond  of 
the  two  olasses  of  operators)  in  the  use  of  the  equipments,  several  weeks 
were  allowed  for  familiarisation  and  aotual  use,  during  whioh  time  these 
personnel  tested  over  U000  eleotron  tubes.  The  sample  tubes  were  then 
measured  on  eaoh  tast  set  by  the  depot  operators,  on  separate  days  and 
undar  the  observation  of  a  laboratory  engineer.  Third,  after  completing 
and  recording  these  measurements,  the  tubes  and  teet  sets  were  returned  to 
I8L  where  the  laboratory  measurements  were  repeated  and  reoorded.  This 
latter  step  insured  against  tuba  damage  during  the  testing  interval.  We 
should  note  that  at  no  time  were  the  depot  operators  aware  that  they  were 
participating  in  the  axparimant.  They  were  merely  Informed  that  the  tube 
teeters  were  being  ohecked  for  ruggedness  under  constant  use. 

In  order  to  avoid  a  possible  influence  of  repetitious  measurements  on 
the  data,  the  measurements  were  performed  by  both  olasses  of  operators  on 
individual  tubes,  selected  in  a  random  fashion. 

Analysis  of  the  measurements  involved  an  estimate  of  the  sampling  vari- 
anoe  for  operators  within  eaoh  of  the  classes*  This  estimate  was  to  be 
derived  from  twenty-five  pairs  of  measurements,  the  expected  values  of 
whioh  were  specially  chosen  to  oover  the  range  of  use  of  the  tube  tester. 

Eaoh  of  these  estimates  of  the  sampling  varianoe  would  then  be  used  in  an 
F  test  to  deteimine  whether  or  not  the  ratio  of  estimate  values  was  con¬ 
sistent  with  an  expeoted  ratio  derived  from  two  random  samples  from  the 
same  population.  The  hypothesis  tested  by  the  F  test  1st  there  is  no 
real  difference  between  the  use  of  the  equipment  by  the  laboratory  engineers 
and  its  use  by  the  depot  operators.  After  the  depot  data  had  been  reoorded, 
it  appeared  that  the  second  depot  operator  possibly  had  received  insuffi¬ 
cient  training  in  the  use  of  the  tube  tester.  In  approximately  twenty-five 
percent  of  his  trials  he  was  unable  to  adjust  the  tube  checker  so  as  to 
obtain  a  reading.  Sinoe  the  estimate  of  operator  variability  is  obtained 
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from  paired  measurements  on  the  same  tube,  the  data  from  this  twenty -five 
percent  were  discarded  leaving  the  data  from  nineteen  tubes  available  for 
estimating  the  variance  for  depot  operators.  The  corresponding  figure  for 
laboratory  engineers  is  25  tubes. 

It  is  to  be  noted  at  this  point  that  this  arbitrary  deletion  of  data, 
deviating  grossly  from  the  average,  does  not  of  Itself  Invalidate  either  the 
results  or  any  conclusions  that  may  be  drawn  therefrom.  In  the  present 
experiment  it  is  recognised  that  the  deletion  aoted  to  provide  an  Indication 
of  higher  reproducibility  within  the  depot  operator  class  than  would  other¬ 
wise  be  evidenced i 

The  structure  of  the  experimental  design  and  its  analytlo  procedures 
have  thue  been  preserved  at  the  cost  of  reduction  of  approximately  twenty- 
five  peroent  of  the  data  by  limiting  the  study  to  those  19  tubes  upon  which 
all  four  operators  obtained  readings.  (See  Figures  1  and  2.) 

As  we  can  see  by  the  ehart,  the  only  major  contribution  to  variation 
in  results,  other  than  that  of  tube  variability  whioh  wan  deliberately 
introduced,  is  that  variation  contributed  by  the  lack  of  reprodueibility 
within  the  class  of  operators. 

Finally,  an  F  test  was  applied  to  the  data  in  order  to  oonflrm  or 
deny  the  hypothesis  of  equal  precision  of  olasses  of  operators.  (See 
Components  of  Varlanoe  Table  at  the  end  of  this  paper.) 

For  eighteen  degrees  of  freedom  in  each  measure  of  varlanoe,  the 
5  peroent  probability,  or  9$%  oertalnty  value,  of  F  is  2.22,  The  value 
of  F  obtained  from  the  measured  data  is  13.55*  This  result  is  definitely 
significant  and  denies  the  tenability  of  the  hypothesis  of  equal  preoislon 
of  the  two  oleeses  of  operators. 

It  was  conoluded  from  this  experiment  that  the  Tube  Tester  T7-2  is  too 
complex  an  instrument  to  be  used  by  depot  personnel  with  any  degree  of 
asauranoe  that  the  reeulte  obtained  by  these  operators  will  accurately 
reflect  the  condition  of  an  electron  tube. 

The  results  of  this  experiment  show  us  that  in  order  to  have  a  truly 
•ffeotive  tube  tester  of  the  T7-2  type,  it  is  necessary  to  eliminate  & 
high  percentage  of  the  operator  manipulations.  The  Signal  Corps  Engineering 
Laboratories  have  instituted  a  program  to  study  the  effeots  of  applying 
automatio  processes  to  a  tube  tester.  The  study  will  take  into  account  the 
effects  of  the  human  engineering  experiment  and  will  be  aimed  at  the  practi¬ 
cal  embodiment  of  an  automatically  controlled  tube  tester  that  will  be  at 
least  aa  email,  light,  and  accurate  as  the  T7-2/U. 


METHODS  OF  ESTIMATING  LETHAL  DOSE  FOR  MAN 


Clifford  J.  Maloney 
Army  Chemical  Corps 

I,  INTRODUCTION,  It  is  of  interest  in  medical  and  biological  research 
to  be  able  to  estimate  the  dose-response  curves1  for  the  mortality  response 
of  humans  to  various  infectious  agents*  Direct  methods  of  experimentation 
are  neither  practical  nor  ethically  permissible,  therefore  indirect  methods 
of  estimation  are  required.  It  is  the  purpose  of  this  paper  to  show  how 
two  routine  types  of  biologieal  measurement  can  be  combined  in  various  ways 
to  produoe  estimates  of  human  mortality  response  to  infeotlous  agents,  3he 
methods  have  the  advantage  of  being  absolute  determinations  depending  on 
no  unverified  extrapolations.  The  types  of  measurement  whloh  a re  required 
erst  (1)  morbidity  and  mortality  rates  for  man  and  other  animals  in  natural 
environments;  (2)  dose-response  experiments  for  morbidity  responses  in 
humans  and  for  morbidity  and  mortality  responses  in'  o thar  animals. 

Morbidity  statistics  can  be  obtained  for  many  diseases  through  routine 
reports  to  health  departments  of  cases  of  infectious  diseases.  Special 
studies  can  be  conducted  to  estimate  the  inoidenoe  of  poorly  reported  or 
non-reportable  diseases.  Adequate  mortality  statistics  are  usually  avail¬ 
able  beoause  death  certificates,  specifying  primary  and  associated  causes 
of  death,  are  filed  for  almost  one  hundred  percent  of  all  deaths  in  the 
United  States.  If  need  be,  the  death  certificate  data  can  be  supplemented 
by  special  surveys  aimed  at  greater  accuracy  and/or  completeness .  Ihe 
numbers  of  cases  and  deaths  within  a  segment  of  the  population  readily  oan 
be  oonverted  to  rates  on  the  basis  of  existing  oensus  statistics,  on 
estimates  of  the  current  population  based  on  previous  oensus  figures,  or 
on  special  sample  surveys  or  enumerations,. 

Good  experiments  involving  animal  morbidity  and  mortality  oan  readily 
be  conducted,  assuming  that  the  obstaele  of  funds  to  purchase  and  handle 
quantities  of  animals  is  overcome.  It  is  moreover  quite  possible  that 
animal  experiments  for  other  purposes  oan  be  exploited.  On  the  other  hand, 
the  requirement  of  fairly  large  numbers  of  experimental  subjects  for  the 
proper  determination  of  dose-response  probit  lines  well  may  interfere 
with  the  eonduot  of  human  morbidity  experiments.  Nevertheless,  experiments 
utilizing  modest  numbers  of  human  volunteers  can  be  set  up,  so  that  desired 
human  morbidity  probit  lines  can  be  defined,  even  though  their  parameters 
may  have  more  than  desirable  sampling  error. 

Three  suggested  methods,  based  on  the  types  of  data  described  above 
for  estimating  human  mortality  dose-response  curves  are  given  below,  Theso 
methods  cannot  be  expected  to  product,  precise  estimates  of  probit  parameters 
but  they  have  an  advantage  over  other  suggested  methods  in  that  they  utilize 
human  data  to  produce  estimates  of  parameters  whioh  describe  human  responses 
rather  than  depending  on  non-human  data  for  such  estimates.  One  suggested 
solution  to  the  problem  of  estimating  lethal  dose  for  man  requires  the 


I.  See  "Section  II  for  a  discussion  of  dose-response  curves. 


86 


Design  of  Experiments 


assumption  that  the  mortality  probit  lines  for  certain  simians,  or  other 
animals,  are  '’close"  to  those  for  humans*  This  latter  method  lacks  logical 
justification  because  we  know  that  there  is  considerable  variation  in  responses 
to  infectious  agents  among  even  closely  related  animals,  i.e.,  the  responses 
to  the  same  dose  of  an  infectious  organism  by  rhesus  monkeys,  by  cynomologus 
monkeys,  and  by  chimpanzees  may  differ  markedly  from  each  other.  The  three 
methods  outlined  below  are  logically  unimpeachable. 

II.  DIGRESSION  ON  DOSE-RESFONSE  RELATIONS.  Biological  material  is  notoriously 
variable.  This  does  not  mean  however,  that  it  is  subject  to  no  law,  but  only 
that  the  law  has  a  statistical  character.  The  response  shown  by  an  organism 
to  a  hostile  influence  of  physical,  chemical,  or  biological  nature  is  there¬ 
fore  predictable  only  in  terms  of  mean  values  over  many  individual  responses. 
The  variability  of  the  response  to  microbiological  agents  is  greater  than 
that  to  chemical  agents.  The  average  response,  of  course,  increases  as  the 
quantity  of  agent  increases.  As  the  response  cannot  be  less  than  zero  nor 
more  than  one  hundred  percent,  a  plot  of  the  mean  dose-response  relation 
would  appear  as: 


Figure  1 


It  has  been  widely  verified2  that  converting  doses  to  the  corresponding 
logarithmic  values  and  transforming  percent  responses  by  the  integral  of  the 
normal  curve  or  error  usually  converts  the  asymmetric  curve  of  Figure  1  to  a 
straight  line. 


Finney ,  D,  J„,  " Probit  Analysis,"  2d  edition.  Chapter  3. 
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Figure  2 

Probit  Log-Dose  'transformation  of  the  Curve  of  Figure  1 


It  is  of  course  true  of  this  line,  as  of  any  straight  line,  that  it  is 
determined  as  soon  as  one  point  on  the  line  and  the  slope  of  the  line  have 
been  fixed.  It  is  customary  to  choose  for  the  point  the  one  showing  the 
probit  of  5C$  animal  response,  since  this  point  requires  less  exoerimentation 
for  its  measurement  to  a  given  degree  of  accuracy  than  any  other  point  on 
the  line.  Ihis  point  is  known  as  the  £0£  endpoint  and  symbolized  as  ED^O. 
Infectivity  response  is  then  ID50  and  mortality  response  LD^O,  It  is  clear  that 
several  distinct  curves  could  be  plotted  on  the  graph  of  Figure  1  and  that  the 
same  transformation  would  reduce  them  all  to  straight  lines  corresponding  to  Figure  2. 

III.  METHOD  I.  This  method  arose  from  the  simple  consideration  that  a  dose 
sufficient  to  kill  must  be  sufficient  to  provoke  symptoms.  Hence,  if  a 
curve  of  doses  vs.  percent  showing  any  chosen  symptom  syndrome  is  plotted 
on  the  same  graph,  the  two  curves  cannot  cross. 

Figure  3 


Hypothetical  Curves  of  Mortality  and  Morbidity 


The  only  point  to  notice  about  the  graph  is  that  the  death  curve  is  beneath 
the  symptom  curve  at  all  doses.  The  probit  transformation  converts  each  of 
these  curves  to  straight  lines.  Now,  as  the  curves  did  not  intersect,  neither 
will  the  lines.  Hence  they  are  parallel.  It  is  clear  that  the  slope  of  the 
mortality  nne  is  therefore  known  because  it  is  the  same  as  that  of  the  mor¬ 
bidity  line. 


88 


Design  of  Experiments 


Figure  I4 


If  the  inf activity  and  mortality  probit  lines  for  man  for  a  particular 
organism  are  parallel*  we  can  estimate  the  LD50?  if  we  have  (l)  experimental 
XD504  data  and  (2)  oase  and  death  rates  observed  in  nature?*  The  following 
steps  lead  to  an  estimate  of  the  parameters  of  the  mortality  probit  line* 

(a)  Fit  a  probit  line  (Y  ■  a  x  /  b)  to  the  experimental  infectivity 
data. 

(b)  Compute  the  oase  rate  in  the  population  (OR). 

(0)  Compute  the  death  rate  in  the  population  (SR)* 

(d)  Using  the  equation  for  the  infectivity  probit  line,  compute 
the  theoretical  infective  dose  for  the  observed  oase  rate 
(XDOR). 

(e)  Hie  equation  for  the  mortality  probit  line  oan  be  obtained 
by  using  the  slope  of  the  infectivity  probit  line  (as  the 
lines  are  parallel  by  the  theory  underlying  this  procedure) 
and  the  intercept  is  defined  by  the  equation  DR  ■  (IDCR) 

a  /  b,  or  b  «  DR  -  a(lDCR). 


3*  Dose  produoing  50  percent  deaths. 

J4.  Dose  producing  50  percent  infection. 

5.  Using  the  route  of  infection  which  is  of  interest. 


V>  #  -0-  vgL'  •  <0  ■_  0  -0 
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Figure  5 


of  Case  and  Uea^h  Rates  to  Fix  Dose-Response  Relation 


I  *  s*/bj  infeotivity  line 

I'  -  a  x  /£DR  -  a(IDCR)3j 
mortality  line 


Dose 


Hie  argument  on  which  parallelism  of  morbidity  and  mortality  lines  is  based 
would  not  apply  in  the  oase  of  ancillary  symptoms  which  do  not  lead  to 
death  when  aggravated.  Henoe,  if  such  non-parallel  lines  are  found,  they 
might  be  used  to  separate  the  symptom  complex  into  those  leading  and  those 
not  leading  to  death. 

IV.  METHOD  II.  The  human  dose-response  mortality  probit  line  oan  be 
estimated  by  scouring  a  minimum  of  two  measurements  of  mortality  at  differ¬ 
ent  measured  dose  levels.  This  might  be  done  by  measuring  exposure  and 
mortality  of  (l)  workers  who  work  in  an  opeupation  with  a  high  risk  of 
infection  (for  example*  farmers  and  ornithosis®*  animal  fiber;  workers  and 
anthrax',  eto,)j  (2)  laboratory  workers  exposed  to  an  organism8 j  (3)  groups 
in  the  normal,  population  exposed  to  relatively  high  doses  of  a  causative 
agent  (for  example,  people  living  near  dairies  in  Zos  Angeles  and  Q  fever9, 
residents  of  Leavenworth  County,  Kansas  and  histoplasmosis,  ate.)1*.  Esti¬ 
mates  of  the  dose  to  which  these  people  are  exposed  oould  be  obtained  by 
intensive  sampling  of  their  environments.  Oause-speoifio  mortality  figures 
oould  be  obtained  by  routine  epidemiological  methods. 

Hie  observed  dosages  and  death  rates  would  then  be  used  to  plot  and/or 
compute  a  mortality  probit  line. 


6.  Karrer,  H»,  B.  Eddie  and  R.  Schmid*  Barnyard  fowl  as  a  source  of  human 
ornithosis.  Case  report,  Calif.  Med.,  73(1950) *55-57. 
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7.  Dlgnam,  B.  S.,  Anthrax— an  industrial  disease,  Conn.  Med,  J. .  15(1951) « 316-17. 


8.  Sulkin,  3.  Edward  and  Robert  M.  Pike,  Laboratory  acquired  infections,  J.A.M.An, 
1U  7  (1951 )» 171*0-1745. 


9.  Shepard,  Charles  C.,  and  Robert  J.  Huebnor,  Q  fever  in  Los  Angeles  Countv. 
Am.  J.  Pub.  Health,  38(191*8) *781-788. 


10.  Furcolow,  Michael  L.  and  Jay  Bitterly,  Further  studies  of  the  geography 


*u*uuxun,  i-u.uu.iesA  a..  »nu  nay  ni-oterjy,  runner  studies  of  the  geogrt 
of  histoplasmin  in  Kansas  and  Missouri,  J.  Kansas  Med.  Soc.  (1951). 
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It  is  well  to  point  out  that  this  technique  is  wholly  unrelated  to  ths 
exploitation  of  extraordinary  laboratory  aeeidents.  In  faot,  suoh  aooidents11 
due  to  recognisable  disorete  departures  from  the  usual  laboratory  environmehtf 
are  to  be  regarded  as  unwelcome  oomplioating  factors,  so  far  as  morbidity  and 
mortality  rates  are  concerned,  though  contributing  fully  to  the  oase  fatality 
rate  determination.  Instead  of  attempting  to  infer  the  dose  aotually  reoeived 
by  cases,  the  average  dose  level  of  exposure  both  of  reactors  and  of  non- 
reactors  would  be  ascertained  by  sampling  procedures** <  Then  teohniquee  for 
oomputlng  bloassay  with  error  in  the  dose^^^rould  be  used. 

V.  METHOD  III.  Method  II  oan  be  modified  so  as  to  eliminate  the  requirement 
of  direct  measurement  of  dosage  in  natural  environments.  This  oan  be  done 
if  mortality  rates  at  two  unmeasured  dose  levels  are  known  for  both  man  and 
for  some  other  animal  speoies,  and  if  a  dose-response  mortality  ourve  oan 
be  obtained  experimentally  for  the  same  animal  speoies. 

The  procedure  of  Method  III  is  as  follows » 


l)  Measure  mortality  both  for  humans  and  for  a  speoies  of  animals  at 
eaoh  of  two  (preferably  widely  different)  dosage  levels,  say  A  and  B.  Call 


TTT  Sabin,  A.  B.  and  A.  M.  Wright,  Acute  asoending  myelitis  following  monkey 
bite  with  isolation  of  virus  oapabla  of  reproducing  disease,  J.  Bcp.  Med. 
$9(1931*11*. 

12.  Ibaoh,  Martha  J.,  Howard  W.  Larsh,  and  Miohael  L.  Furcolow,  Epidemio 
histoplasmosis  and  airborne  Histoplasma  capsulatum,  Proo.  See,  Exp. 

Biol,  and  Med.,  8*(l9*i») 

13.  Maloney,  G.  J.,  Calculation  of  median  lethal  dose  when  doses  are  subject 
to  Poieson  errors,  Unpublished. 

llo.  Haley,  David  C.,  Estimation  of  the  dosage  mortality  relationship  when 
the  dose  ie  subject  to  error.  Technical  Report  No,  1*,  (19$2),  Applied 
Mathematics  and  Statistics  Laboratory,  Stanford  University. 
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the  human  mortality  rates  DRA^  and  DRB^,  and  the  animal  mortality  rates  ERA 

and  OREL. 

m  \ 

2)  For  the  same  animal  species,  conduct  a  laboratory  bioaasay  experi¬ 
ment  using  animals  trapped  at  the  location  under  study,  and  calculate  the 
doee-reeponee  animal  mortality  probit  line* 

3)  Using  the  animal  mortality  probit  line,  compute  the  doeee  at  levels 
A  and  B  whloh  correspond  to 1  ERAft  and  SRBt.  Call  these  LEA  and  LDB* 

10  Plot  DRAh  va,  LDA  and  CRB^  ve.  1DB  on  probit  paper  and  connect  the 
points  with  a  straight  line*  This  is  an  estimate  of  the  human  dose-response 
mortality  curve. 

Figure  ? 


VI,  REMARKS.  It  is  obvious  that  the  preceding  three  •’pure"  methods  do  not 
exhaust  the  various  possibilities,  and  that  "mixed”  procedures  may  bs  employed, 
or  that  several  methods  may  be  used  and  then  combined  to  get  a  stronger  over¬ 
all  estimate  than  that  offered  by  the  separate  procedures. 

On  the  other  hand,  these  methods  are  only  applicable  provided  the  route 
of  infection  in  nature  is  the  route  of  interest. 

The  hypothesis  of  parallel  morbidity  and  mortality  probit  lines  can  be 
tested  on  humans  utilizing  ideas  from  methods  I,  II,  and  III,  if  we  oan 
obtain  in  the  field  human  morbidity  and  mortality  data  for  several  doses, 
and  if  we  can  either  measure  these  doses  direotly  or  infer  them  from  animal 
responses,  as  in  methods  II  and  III. 


VII.  INDEPENDENT  ACTION  MODEL.  The  probit  transformation  of  the  dose 
response  curve  outlined  in  Section  II  and  discussed  in  detail  in  Finney,  is 
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not  the  only  one  which  has  been  proposed.  Berkson^  has  suggested  a  model 
based  on  the  use  of  the  logistic  curve  rather  than  the  integrated  normal, 
little  practical  difference  exists  between  these  two  methods1®.  An  alterna¬ 
tive  suggestion  ,wfith  enormous  practical  importance  has  been  proposed  apparently 
independently  by  Qoldberg, 'at  al.  and  by  Peto1  . .  Ihe  suggestion  had  previously 
been  applied  to  transmission  of  plant  virus1?,  and  recently  in  connection 
with  the  biological  effects  of  radioactivityzc .  An  early  treatment  was  given 
by  Yule21.  A  maximum  likelihood  procedure  for  fitting  this  ourve  has  been 
given  by  Charnoff  and  Andrews22.  Peto23  has  shown  that,  if  this  model  fits, 
then  every  probit  curve  will  have  a  slope  whose  numerical  value  is  too 
Irrespective  of  the  agent,  host,  end  routs  of  administration..  It  Is  olear 
that  if  this  model  is  correct  then  nothing  is  required  for  a  complete  deter¬ 
mination  of  the  dose  response  relation  desired  but  the  collection  of  cause 
specif io  death  rates. 

This  oonsequenoe  of  the  Independent  action  model  is  so  Important  that 
it  is  essential  to  determine  whether  or  not  the  theory  is  substantiated. 

An  estimate  of  the  extent  of  experimentation  required  to  provide  tests  of 
this  hypothesis  has  been  furnished  Aerobiology  Branch  at  their  request2**. 


15 •  Berkson,  Joseph,  Application  of  the  logistic  function  to  biofssay.  J. 

Am.  Stat.  Assn.,  39(191*1*)  »357-365. 

16.  Haley,  David  0.,  o£.  cit. 

i  '  ' 

17.  Qoldberg,  L.  J.,  H.  M.  S.  Watkins,  M.  8.  Dohvata,  N.  A.  Schlamn,  Studies 
on  the  experimental  epidemiology  of  respiratory  infeotions  17.  Relation¬ 
ship  between  dose  of  microorganisms  and  subsequent  infection  or  death  of 
a  host.  J.  Inf.  Dls.,  9)*(19!>1*) *9-21. 

18.  Peto,  8.,  A  doss  response  equation  for  the  invasion  of  microorganisms. 
Biomstrios  9(1953) *320-335. 

19*  Watson,  M.  A.,  Factors  affeoting  the  amount  of  infection  obtained  by 
aphid  transmission  of  the  virus  by  Hy.  III.  Phil.  Iban.  Roy.  Soc. 

226,  pp.  1*57-1*09. 

20.  Kimball,  Allan,  The  fitting  of  multi-hit  survival  curves.  Biometrics  9 
(1953) *201-211. 


21.  Yule,  0.  Udny,  On  the  distribution  of  deaths  with  age  when  causes  of  death 
aot  cumulatively  and  similar  frequency  distributions.  J.  Roy.  Stat.  Soc. 
73(1910) *26-38. 

22.  Chemoff,  Herman  and  Fred  Andrews,  A  large  sample  bioassay  design.  Tech. 
Rpt.  No.  17,  Applied  Mathematics  and  Statistics  Laboratory,  Stanford  Univ. 

23.  Peto,  8.,  o£.  ait.  (pp.  329  ff) 

2lu  Statistics  Branch  Job  No.  1699,  Dose  response  equation  for  microorganisms. 
Experimenters  Dr.  Ptrsichetti  and  Q,  Broadwater.  Statistician  SP3  Richard 
Lamm,  1955* 
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SOME  STATISTICAL  ASPECTS  OP  FATIGUE  TEST  PLANNING 

V.  A*  Didio 
Watertown  Arsenal . 

In  studying  metal  fatigue  one  is  usually  most  interested  in  the 
accumulated  damage  produced  in  a  part  or  specimen  that  is  subjected  to 
repeated  stresses*  .This  is  studied  experimentally  by  subjecting  specimens 
to  repeated  cycles  of  constant  stress  or  constant  defleotion  and  dbserving 
tho  number  of  cycles  at  whioh  failure  ooours*  . 

The  stress  applied  to  the  specimen  may  be  due  to  different  types  of 
loads,  suoh  as  a  compressive  or  tensile  load,  bending,  torsion,  or  a  oom- 
bination  of  suoh  loadsi  and,  even  though  a  constant  stress  is  applied  to  a 
number  of  like  specimens  under  as  uniform  a  set,  of  conditions  as  possible, 
there  is  observed  oonaiderable  scatter  in  the  number  of.  oyolea  at  failure, 
where  failure  can  be; defined  in  various  ways*  It  could  be  oonaidered  as 
fraoturs,  or  the  experiment  could  be . stopped  and  failure  'said  to  have 
occurred  in  the .specimen  when  some'  predetermined  decrease  in  stiffness 
is  observed*'  v  • ...  i  •  j . . •• ;  . 

,  Scatter  is  Inherent  .in  the  .experimental  results,  due  to  the  nonhomogeneity 
of  the  material, on  the  microscopic  and  sub-fflioroaeople.saale  and. localized 
textural  differences  suoh  as  machining  and  heat-treating  effects.  The  oareful 
experimenter  tries,,'  insofar  aa  he  oan,.  to  eliminate  the  possible  onuses  of 
these  variations  by  standardising  techniques  in  preparing, specimens.  He  makes 
the  speoimena  from  the  same  bar,  or  at  least  the  same  heat.  He  heat  treats 
specimens  under  uniform  conditions.  He  machines  and  polishes  speoimena  suoh 
that  residual  stresses  will  not  he  introduced.  In  addition,  variability  due 
to  the  fatigue  machine  and. its  loading  is  reduoed  as  muoh  aa  possible.  All  ■ 
of  these  parameters  lead  to  varying  life  spans  for  individual  speoimena, 
as  well  as  causing  fracture  at  different  positions  along  the  speoimena*  : 

In  spite  of  all  itheae  precautions ,  the  life,  of  one  specimen  differs 
from  that  of  the  next  suoh  that  results  of  fatigue  tests  show  a  muoh  wider 
scatter  than  the  results  of  any  other  meohanioal  test*  Sven  if  the  metal 
or  alloy  were  free  of  all  impurities  or  imperfections,  a  variability  in  its 
strength  values  would  exist  throughout  its  volume,  beoause  of  ite  crystal 
structure.  Variability  cannot  be  eliminated  and  the  soatter  Inherent  in 
fatigue  teste  is  accepted  os  a  basis  for  the  need  of  statistical  analysis* 

The  variation  in  number  of  oycles  to  failure  of  apparently  similar'  specimens 
subjeoted  to  the  same  level  of  repeated  stress  obsoures  the  results  of  many 
fatigue  testing  programs  and  makes  it  necessary  to  run  a  relatively  large 
number  of  testa  in  order  to  obtain  the  desired  information. 

Theoretical  explanations  of  the  internal  processes  in  the  specimen 
whioh  lead  to  failure  are  many  and  varied.  They  range  from  consideration 
of  atomio  dislocation  movements  to  gross  slip  in  individual  orystals.  Any 
attempt  at  a  theoretical  explanation  ofasoomplex  a  phenomenon  as  fatigue 
must  necessarily  appear  as  an  over-simplification  of  the  behavior  of  real 
materials. 
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The  results  of  fatigue  teste  are  usually  presented  in  the  form  of  S-N 
diagrams  or  curves;  i.e.,  the  stress  3  is  plotted  vs.  the  number  of  cycles  N 
to  failure.  Usually  these  diagrams  are  determined  by  a  rather  arbitrary 
prooess  of  curve  fitting  through  a  relatively  small  number  of  points,  which 
represent  results  of  individual  fatigue  tests  performed  at  several  stress 
levels.  They  are  obtained  as  "lines  of  best  fit",  in  whioh  oase  they  are 
assumed  to  refer  to  the  average  fatigue  performance  of  the  specimens.  Such 
presentation  of  the  fatigue  teets  is  necessarily  inadequate,  since  it  neg¬ 
lects  a  very  significant  aspect  of  all  fatigue  data,  their  acatter.  Another 
defioienoy  of  these  S-N  diagrams  is  that  they  are  valid  only  within  the  range 
of  stress  amplitude  under 'the  repeated  application  of  which  a  specimen 
actually  fails,  while  our  interest  may  be  eleewhere. 

Because  of  the  significance  of  the  scatter  and  its  expected  variation 
with  the  applied  s tresi 1 amplitude ,  results  of  fatigus  tests  can  bs  effeotivs- 
ly  prssented  only,  by  a  rslation  bstwsen  ths  stress,  S,  the  number  of  byolea* 

N,  and  the  probability, P,  that  any  specimen  Subject  to  N  repetitions  of 
the  etrees  amplitude ,-8,  will  actually  break  at  or  before  N  oyoles  (mortality 
fi motion)  or  ths  probability  that  it  will  survive  this  number  of  cycles 
(survivorship  funotion).  This  can  be  acoompllehed  with  statistical  techniques, 
thus  making  a  fuller  use  of  the  Information  present  in  fatigue  results,  while 
presenting  them  in  a  manner  that  is  more  meaningful  and  aoourate. 

,  In  the  design  of  structures  and  maohina  parts  which  will  be  aubjeoted 
to  loads  and  vibratory  strosass,1  a  reasonable  saf sty  from  fatigue  failures 
must  bs  ensured,  indicating  a  special  concern  with  very  small  probabilities 
of  failure  or  large  probabilities  of  survival.  These  oannot  be  found  directly 
by  experimentation  without  tasting  a  very'  large  number  of  speclemns.  There¬ 
fore,  results  of  fatigue  tests  are  useful  only  if  combinations  of  (BN)  can  be 
predicted  by  extrapolation,  at  whioh  the  probability  of  Survival  can  be  made 
as ' close  to  unity  as  deslrsd  with  respect  to  the  specified  factor  of  safety. 
Suoh  extrapolation  beyond  the  range  of  the  actual  experiment,  however,  requires 
an  adequate  knowledge  of  the  oharaoter  of  (SN)  probability  surface,  and  thus 
of  the  statietloal  dietrlbuticnof  N  for  oonstant  Values  Of  S,  as  well  as  8 
for  constant  values  of  N,  particularly  in  the  vioinity  of  the  endurance 
limit  where  the  probability  of  survival  approaches  one. 

We  are  thus  left  with  finding  a  mathematical  approximationof  the 
fatigue  phenomena  as  expressed  through  data  oolleoted  by  experimental 
studies.  If  we  suppose  the  existence  of  an  exaot  relationship  between  the 
life  of  a  speolraen  and  the  stress  to  whioh  it  is  subjected,  and  approximate 
it  by  some  mathematical  expression,  it  will  be  readily  found  that  even  if 
the  approximation  is  not  very  dose  the  number  of  testa  neoessary  to  reveal 
the  difference  between  the  exaot  and  approximate  relation  will  be  surpris¬ 
ingly  large,  owing  to  the  wide  scatter  present  in  the  observed  fatigue 
lives.  An  improvement ' of  the  approximation, either  by  ohanging  the  function 
or  by  increasing  its  number  of  parameters,  will  soon  bring  us  to  a  position 
of  being  unable  to  decide  experimentally  whether  or  not  there  are  any  diver¬ 
gences.  Ab  a  oonsequenoe,  there  may  exist  two  or  more  relationships  of 
different  shapes  that  satisfactorily  represent  the  data.  Therefore,  the 
only  reasonable  way  to  act  seems  to  be  to  ohoose  a  function  whioh  most 
easily  gives  answers  to  posed  questions  and  is  Btill  consistent  with  known 
fatigue  properties  of  materials. 
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A  distribution  of  fatigue  life  of  specimens  subject  to  a  given  stress 
•amplitude  that  represents  the  actual  distribution  of  test  results  rather 
clone!/  in  obtained  by  assuming  that,  in  each  large  group  of  specimens 
tooted  at  the  same  stress  amplitude  and  subject  to  a  number  of  load  cycles, 
the  specimen  that  actually  fails  at  this  number  is  neoessarlly  the  weakest 
specimen.  Hence,  the  specimens  that  fail  at  various  numbers,  N,  of  load 
cycles  may  bo  considered  as  forming  a  group  of  the  weakest  specimens  out  of 
(large)  samples  of  the  population  tested;  to  the  analysis  of  the  distribu¬ 
tion  of  N  in  this  group,  the  theory  of  extreme  values  might  therefore  be 
rij  Med.  The  distribution  of  extreme  values  can  thus  be  derived  from  any 
re*  '.•n.iblo  assumption  concerning  the  distribution  of  the  population  from 
which  the  extremes  are  drawn.  It  must  be  noted  that  the  use  of  the  extreme 
value  distribution  has  its  strongest  justification  in  that  it  is,  as  far 
as  can  empirically  be  established,  a  good  approximation  to  actual  test 
results. 

As  in  most  experimental  investigations,  the  probability  functions  are 
actually  determined  from  the  test  results.  The  direct  determination  of 
the  frequency  distribution  would  require  a  much  larger  number  of  experi¬ 
ments  than  can  usually  be  performed.  There  is  also  available  an  extremal 
probability  paper  on  which  a  graphical  indication  may  be  obtained  concern¬ 
ing  the  possibility  that  a  variable  has  an  extreme  value  distribution. 

This  would  be  shown  by  a  straight  line  relationship  between  the  variable 
and  a  reduced  statistical  variate,  similar  to  the  use  of  normal  probability 
paper . 


Undor  certain  assumptions  concerning  the  theoretical  processes  that 
produoo  fatigue,  the  fatigue  life  of  the  population  at  a  particular  stress 
level  can  bo  shown  to  be  logarithmic  normal,  so  that  the  distribution  of 
log..  N  in  the  population  of  specimens  can  be  expected  to  be  normal.  This 
normality  of  log  N  was  first  noticed  in  the  results  of  experimental  teats, 

The  opeoimons  that  actually  break  at  given  values  of  log.-  N  are  thus  the 
weakest  specimens  in  samples  of  the  normally  diatrlbutedApopulatlon  of 
fatigue  'lives. 

In  dealing  with  the  exact  distribution  of  extremes ,  many  difficulties 
are  encountered  in  numerically  evaluating  it,  oven  when  the  initial  dis¬ 
tribution  is  known.  To  overcome  thia-nbataole,  asymptotic  distributions 
valid  for  largo  samples  wore  derivod*^  .  Those  asyraptotio  distributions  vary 
depending  on  the  initial  distribution  from  which  the  oxtromes  were  taken 
and  whether  or  not  the  variate  boing  considered  is  limited  or  unlimited 
in  the  direction  of  the  extreme  being  oonsidorod,  When  the  initial  dis¬ 
tribution  is  of  the  exponential  type,  as  for  example  the  Normal  Distribu¬ 
tion,  wo  have  the  first  equation,  which  is  the  asymptotic  probability  of 
the  smallest  value  x.  y  is  a  roduood  variate  analogous  to  the  standardised 
variate  used  in  normal  distributions.  *<  is  a  mo  sure  of  dispersion,  and  y 
in  the  mode  of  the  distribution  of  x. 


[IT  dutiibol,  l”.  J,,  "Statistical  Theory  of  Extreme  Values  and  Sorno  Practical 
Applications";  N.D.S.  Applied  Math.  Series  #j5p« 
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(1)  l(x)  =  exp  j^eyJ 

where 

y  »«((x  -«y) 

(2)  P(x)  -  exp 

x  2w)  -y  >w. 

The  funotion  represented  by  the  second  equation  is  for  the  extremes 
of  smallest  values,  where  k  is  a  measure  .of  dispersion  and  variate  x  now  has 
a  lower  limit.  This  funotion  can  also  be  derived  from  the  first  equation 
by  a  logarithmic  transformation  of  the  variate  and  is  known  as  the  third 
asymptotio  probability  function. 

Before  analyzing  our  survivorship  function,  we  must  make  assumptions 
concerning  the  existence  of  an  upper  or  lower  limit  of  the  funotion,  which 
must  be  based  on  experimental  facts*  This  will  determine  our  choice  of 
analysis,  for  while  we  usually  are  more  interested  in  the  endurance  limit 

of  a  specimen  there  is  also  the  problem  of  whether  or  not  there  is  a  mini¬ 

mum  life  for  this  sample,  i,e.,  a  certain  N  at  a  stress  level,  S,  such  that 
failure  will  not  oocur  below  this  number  of  cycles.  Xn&tigue  tests,  the 
stress  is  kept  constant  and  the  number  of  cycles  to  failure  N  noted  for  eaoh 
group  of  specimens,  implying  that  suoh  an  N  exists  and  is  greater  than  zero, 
although  this  may  not  be  true  for  soft  metals.  Thus,  since  our  variate  N 
is  limited,  we  use  what  is  referred  to  as  the  third  asymptotic  probability 
funotion. 

Our  design  of  the  experiment  will  also  depend  on  what  particular  aspect 
of  fatigue  we  are  interested  in-*ehduranoe'‘  litoifc,  Minimum  lifp,  or  median  . 
fatigue  life.  This  will  determine  the  placement  of  the  various  numbers  of 
stress  levels  that  we  will  use*  The  stress  levels  should  be  sufficiently 
far  apart  to  make  the  test  results  significantly  different,  but  near  enough 
to  allow  us  to  construct  a  survivorship  function.  j 

In  particular,  the  object  of  most  fatigue  programs  is  the  determination 
of  the  endurance  limit  of  a  specimen  or  part.  The  true  endurance  limit  is 
the  greatest  stress  for  which  the  probability  of  surviving  an  infinite  number 
of  cycles  equals  one.  The  estimate  of  the  true  endurance  limit  cannot  be 
checked,  since  we  cannot  let  the  testing  machine  run  for  an  indefinite  number 
of  cycles.  For  this  reason,  it  has  been  customary  in  testing  steels  to 
replace  infinity  by  10'  cycles  andto  define  endurance  limit  as  the  largest 
stress  for  which  the  probability  of  surviving  10'  cycles  at  this  stress  is 
one.  At  this  stress  level,  failure  becomes  independent  of  N--that  is, 
the  (SN)  curves  become  parallel  to  the  N  axis. 


„  m  **  -N  .%  .  ■  „  *  %  *  i  V  -  "  j\  *.,*  -*  •*  *  *  H*.  i ,  ►ft,  »*..  *  *  4  *  .  "  >*  •  ' 


Design  of  Experiments 


97 


KKS 


I 


Estimation  of  the  endurance  limit  based  on  a  specific  interpretation 
of  the  existing  data  by  using  probabilities  of  survival  and  statistical 
theory.  If  an  analytic  expression  for  S  as  a  function  of  N  could  be  derived 
from  physical  considerations,  its  extrapolation  for  N  a 00  would  lead  to 
knowledge  of  the  endurance  limit.  Since  no  such  expression  is  known,  the 
endurance  limit  stress  has  to  be  estimated  by  extrapolation  from  the  proba¬ 
bility  of  survival  valid  for  large  values  of  N.  For  this  purpose,  we  need 
a  specific  distribution  theory. 

Theoretical  considerations  such  as  those  which  led  to  the  extreme  value 
distribution  can  lead  us  to  approximations  of  the  observed  physical  phenomena. 
These  approximations  must  be  tested  against  experimental  results,  experi¬ 
mental  results  which  are  sufficient  and  accurate  to  enable  us  to  arrive  at 
some  conclusions  or  to  give  us  indications  on  how  close  our  approximation 
stands  to  reality.  Our  need  now  is  for  verification  or  alteration  of  our 
premises  by  experimentation. 

The  tool  for  our  estimation  of  the  endurance  limit  is  the  probability 
of  permanent  survival,  which  is  a  funotion  of  stress.  This  probability  will 
be  estimated  from  the  number  of  specimens  that  failed  and  the  number  that 
survived  at  different  stress  levels.  These  Will  be  analysed  with  the  help 
of  the  asymptotic  theory  of  smallest  values  of  a  non-negative  variate  and, 
in  turn,  will  lead .to  an  estimation  of  the  endurance  limit.  Available 
probability  tables  for  the  analysis  of  extreme-value  data  aid  us  in 
determining  our  parameters  and  in  estimating  the  enduranoe  limit. 

The  probability  of  permanent  survival  is  usually  determined  from 
experiments  performed  in  the  following  manners 

A  number  of  specimens  is  subjected  tc  a  constant  maximum  stress  during 
an  incx'easing  number  of  cyclos8  N,  up  to  failure.  The  number  of  cyoles  at 
failure,  N,  is  recorded,  or  no  failure  occurs  the  experiment  is  stopped 
at  a  high  number,  say  N  =  1C  or  10°.  Specimens  are  usually  tested  first 
at  a  stress  .level  such  that  either  all  specimens  fail  or  a  small  proportion 
survives.  This  experiment  is  then  repeated  for  a  number  of  different 
stresses.  From  these  results,  we  can  determine  the  probability  of  survival 
as  a  function  of  the  variate  N  for  each  value  of  S  tested,  noting  that  for 
constant  N  the  probability  of  survival  increases  as  the  stress  decreases. 

Those  probabilities  could  also  bo  determined  by  subjecting  a  number 
of  specimens  to  n  fixed  stro.,s,  S,  and  stopping  the  experiment  at  a  pre¬ 
determined  number  of  cycles,  noting  the  proportion  of  survivors  at  each 
stress.  This  would  be  repeated  for  the  same  number  of  cycles  at  lowor 
and  higher  stresses,  such  that  the  range  of  variation  of  the  stress  reached 
from  the  low  stress  where  all  specimens  survive  up  to  the  high  stress  where 
all  specimens  fail  for  the  same  number  of  cycles,  Those  results  would 
enable  us  to  determine  the  probabi  lity  of  survival  as  a  .'unction  of  3  for 
constant  number  of  cycles,  ft,  where  the  stress,  S,  now  takes  on  the  role 
of  a  statistical  variate,  although  it  is  aslant  with'ii  each  experiment,. 

C2T  "Probability  fables  for  the  Analysis  of  Extreme  Value  J.)nta";  ILB.S. 

Applied  [lath,  .lories  #,V . 
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Thus,  for  a  constant  value  of  the  probability  of  survival,  there 
corresponds  a  series  consisting  of  different  numbers,  N,  as  a  function 
of  S,  or  8  as  a  function  of  N,  suoh  that,  if  N  is  plotted  on  the  absoissa 
and  S  on  the  ordinate,  (SN)  curves  are  obatined  where  S  decreases  for  in* 
oreasing  values  of  N  for  any  constant  probability, 

These  three  representations  of  fatigue  data  are  linked;  each  one  must 
be  compatible  With  the  other  two,  making  it  unacceptable  to  use  an  empiri¬ 
cal  relation  for  one  of  these  functions  if  it  contradicts  the  theoretical 
properties  of  the  other  two  functions.  Also,  any  conclusion  drawn  from 
an  alleged  discontinuity  of  an  (SN)  curve  must  be  wrong,  since  there  is 
no  reason  to  doubt  the  continuity  of  the  survivorship  functions, 

■f 

In  Figure  1  we  have  a  schematic  (SN)  diagram  where  each  curve  corres¬ 
ponds-  to'  a  fixed  probability  of ‘ survival.  The  top  ourve  corresponds  tb  a 
small  probability  of  survival  and,  for  all  combinations  oT  S  and  N  above 
this  curve,  failure  is  practically  certain. 

The  middle  curve  is  for  a  probability  of  survival  l/e  ■  0/56788,  The 
S  and  N  values  for  this  ourve  are  oalled  the  characteristic  stresses  and 
number  of  cyoles  to  failure,  respectively.  These  values  arise  when  y  ■  0 
in  Equation  (1). 

The  lowest  ourve  in  Figure  1  consists  of  S  and  N  values,  before  which 
no  failure  occurs.,  From  this  curve  we  oan  find  our  endurances  at  any 
number  of  cycles.  For  values  of  S  and  N  below  this  ourve,  survival  is 
certain  in  a  probability  sense. 

a  Notice  that  the  (SN)  ourves  become  parallel  to  the  N  axis  as  N  approaches 
10  «  The  stresses  at  this  number  of  cyoles  are  used  in  estimating  our  true 
enduranoe  limit  stress,  which  we  have  defined  os  the  largest  stress  for  which 
the  probability  of  surviving  10'  cycles  is  one*. 

Our  discussion  has  centered  about  the  tail  end  of  the  distribution  under 
study,  since  establishment  of  an  endurance  limit  is  important  to  continued 
studies  of  variables  that  appear  in  fatigue  testing;  but  suoh  information 
is  no  better  than  the  knowledge  of  its  accuracy:.  This  is  not  only  a  statis¬ 
tical  problem  but  also  one  of  lack  of  sufficient  fatigue  data  adequate  for 
statistical  interpretation,  which  shortage  should  be  alleviated  since  its 
existence  is  now  so  evident. 

There  is  still  much  research  needed  on  the  characteristics  and  behavior 
of  various  extreme  value  distributions.  We  especially  must  increase  our 
knowledge  of  their  behavior  for  small  sample  sizes.  Also,  the  optimum  number 
if  specimens  that  should  be  tested  at  any  stress  level  is  still  a  matter 
to  be  decided,  due  to  our  lack  of  knowledge  of  the  distribution  of  our 
estimation  of  the  parameters. 

Knowledge  of  the  parameters,  their  distribution,  the  effect  of  sampling 
errors,  confidence  limits,  etc.  are  essential  before  we  can  start  on  another 
aspeot  of  fatigue  testing,  whioh  should  be  the  ultimate  aim  of  the  study  of 
fatigue,  i,e.,  the  determination  of  a  theory  for  predicting  the  behavior  of 
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materials  under  repeated  stress.  There  are  various  theories  seeking  to  ex¬ 
plain  fatigue  from  the  viewpoint  of  engineering  principles.  These  theories 
develop  under  controlled  experimentation  and  achieve  what  validity  they  have 
by  being  statistically  significant  and  physically  consistent. 

Successful  study  of  such  variables  as  position  of  failure,  effect  of 
size  and  shape,  the  frequency  of  load  cycles,  temperature— all  are  dependent 
to  various  degrees  on  the  determination  of  the  enduranoe  limit. 

Here,  statistics  has  a  two-fold  job. 

It  must  aid  the  design  engineer  in  the  design  of  parte  or  machines  by 
giving  him  a  criterion  concerning  fatigue  life  or  enduranoe  limit  on  which 
to  base  his  analysis*', 

Then,  it  must  develop  as  a  tool  which  will  enable  the  engineer  and 
metallurgist  to  better  understand  the  phenomena  whioh  is  now  referred  to 
by  the  general  term,  brittle  behavior,  and  allow  him  to  evaluate  the  effeota 
of  Introduced  variations  on  metals. 
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THE  USE  OF  A  EPBCIAL  SYSTEMATIC  DESIGN 
FOR  SURVEILLANCE  TESTING 

Robert  M.  Eissner 
Ballistic  Research  Laboratories 

In  testing  field  artillery  ammunition  in  order  to  evaluate  its  bal¬ 
listic  quality,  i.e.,  its  exterior  and  interior  ballistic  characteristics, 
range  and  velocity,  separate  loading  ammunition  presents  us  with  a  dif¬ 
ferent  problem  from  that  involved  in  testing  fixed  or  semi-fixed  ammuni¬ 
tion.  In  fixed  or  eemi-fixed  artillory  ammunition,  or  any  artillery 
ammunition  as  that  matter,  we  have  what  is  called  a  complete  round*  This 
oomplete  round  is  composed  of  a  fuze,  a  projectile,  a  propellant,  and  a 
primer,  all  of  which  are  packaged,  stored  and  issued  as  one  unit  and  oan 
be  loaded  into  a  weapon  in  one  operation.  A  group  of  these  units  with 
certain  restrictions  imposed  upon  it,  e.g.,  being  manufactured  under 
similar  conditions  within  certain  time  periods,  using  only  one  propellant 
lot,  ueing  not  more  than  two  primer  lots  and  fuze  lots,  and  using  empty 
projectile  lots  from  only  one  manufacturer,  comprise  a  complete  round  lot# 
When  a  sample  from  this  oomplete  round  lot  is  fired  in  the  field,  it  oan 
be  said  that  the  measured  range  and  velooity  are  characteristics  of  that 
one,  and  I  repeat  one,  complete  round  lot*  Thus  eaoh  complete  round  lot 
as  such  in  storage  has  a  range  and  velocity.  However,  with  separate 
loading  ammunition  such  is  not  the  ease.  As  might  be  suspeoted  from  the 
name,  each  of  the  components,  namely  the  shell  and  the  propellant,  are 
packaged,  stored  and  issued  separately  and  are  also  loaded  into  a  weapon 
separately.  Thus,  since  any  propellant  lot  might  be  fired  with  any  number 
of  projectile  lots  in  the  field  and  vice  versa,  the  concept  of  a  measured 
range  and  velocity  for  each  complete  round  lot  of  separate  loading  ammuni¬ 
tion  in  storage  does  not  exist.  The  propellant  as  a  separate  item  of 
it  sue  hat;  its  characteristic,  velooity,  and  the  shell  as  a  separate  item 
of  issue  has  its  characteristic,  range. 

Now  in  surveillance  testing  it  is  desired  that  the  quality  of  each 
lot  in  storage,  whether  it  be  a  lot  of  fixed  ammunition,  semi-fixed  ammuni¬ 
tion,  separate  loading  projectile  or  propellant,  b8  evaluated.  To  do  this, 
periodically  lots  of  a  given  type  of  ammunition  are  campled  and  fired 
in  some  manner  in  order  that  those  characteristics  range,  velocity, 
functioning,  etc.,  which  are  needed  to  ascertain  the  quality  of  a  lot  may 
be  obtained.  Upon  obtaining  these  characteristics,  say  mean  range,  standard 
deviation  in  range,  mean  velocity,  standard  deviation  in  velocity,  number 
of  duds,  number  of  low  order  functioning s,  etc*,  a  lot  may  be  assigned  one 
of  four  grades  by  using  a  set  of  previously  established  Lot  Quality  Stand¬ 
ards.  Thus,  in  this  manner  the  quality  or  grade  of  the  individual  lots  in 
storage  may  be  evaluated.  However,  in  addition  to  this  it  is  also  desired 
that  over-all  estimates  of  the  round-to -round  and  lot-to-lot  dispersions 
for  a  particular  ammunition  type  be  obtained.  Such  information  is  of 
great  benefit  to  the  using  field  force e,  those  people  involved  in  preparing 
firing  tables,  and  thoee  people  involved  in  weapons  systems  analyses. 

With  this  brief  description  regarding  purveillance  testing  of  artillery 
ammunition,  the  problem  involved  in  testing  separate  loading  ammunition 
may  be  clearly  seen,  that  is,  how  can  we  fire  an  economically  feasible 
test  and  still  get  the  desired  results  mentioned  previously? 
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In  answering  this  question  it  must  first  be  realized  that  it  is  most 
difficult,  in  fact  almost  impossible,  to  control  all  the  extraneous  fac¬ 
tors  that  may  affect  a  ballistic  test.  In  no  way  is  a  ballistic  test  like 
a  laboratory  experiment  where  most  of  the  factors  can  be  rigidly  controlled. 
Weather  conditions*  tube  conditions,  etc.,  once  a  test  has  started  just 
cannot  be  controlled.  Consequently,  in  a  surveillance  test,  we  must  be 
sure  that  we're  getting  ,the  unbiased  estimates  of  the  parameters  heeded 
to  grade  a  lot,  i.e.,  that  we're  getting  estimates  that  actually  reflect 
differences  in  lots  and  not  differences  due  to  methods  of  test  or  other 
extraneous  factors.  We  want  to  be  sure  that  we  will  not  penalize  or  down¬ 
grade  any  lot  for  any  other  reason  .than  Inferior  performance •  For. these 
reasons  then  it  is  necessary  that  we  make  use  of  designed  experiments 
and/or  reference  lots  or  standard  lots  as  they  are  often  called,.  In  this 
way  we  hope  to  eliminate  or  minimize  any  extraneous  factors  and  to. estimate 
the  parameters  for  each  lot  with  equal  precision.  Having  all' this  Infor¬ 
mation,  there  are  two  general  methods  of  'test  that  can  be  employed  lh  order 
to  get  the  desired  results— those  in  whioh  test  propelling  charge  lots  and 
test  shell  lots  are  fired  in  the  same  program  and  those  in  which  test  pro¬ 
pelling  charge  lots  are  fired  with  reference  shell  lot  (the  reference 
shells  all  being  loaded  to  the  prescribed  standard  weight)  in  one  program 
and  test  shell  lots  are  fired  with  the  reference  propellant  lot  In  another* 
One  word  here  on  what  is  meant  by  a  reference  shell  lot  or  a  referenoe 
propellant  lot*  A  reference  lot  Is  that  lot  which  has  been  standardized 
and  fires  a  known  or  firing  table  value  when  fired  under  standard  condi¬ 
tions,  i.e.,  standard  meteorological  conditions,  new  gun  tube,  standard 
propellant  temperature,  etc.  Generally  extensive  firings  uiing  a  number 
of  different  tubes  on  each  of  several  days  have  been  conducted  on  these 
reference  lots  in  order  that  the  greatest  possible  amount  of  information 
about  the  lot  is  available.  N0w  getting  back  to  the  methods  of  test,  the 
first  method,  the  one  in  whioh  the  test  propelling  charge  lota  and  the 
test  shell  lots  are  fired  in  the  same  program,  is  greatly  more  economical* 
In  faot  it  involves  only  about  half  as  muoh  firing  as  does  the  second 
method.  In  addition  it  also  more  nearly  approaches  actual  flald  firing 
conditions,  where  a  mixture  of  propelling  oharge  lots  and  shell  lots  may 
be  fired  during  the  same  mission  although  they  are  not  supposed  to  be  fired 
in  that  manner.  The  second  method,  however,  is  a  less  complicated  proce¬ 
dure  and  gives  estimates  of  mean  range  and/or  velocity  and  standard  devia¬ 
tion  of  range  and/or  velocity  better  suited  for  surveillance  purposes, 
i.e.,  gmding  of  the  individual  lots.  It  is  also  the  procedure  generally 
followed  in  the  acceptanoe  teste  of  the  ammunition. 

Now  that  we  have  given  these  general  descriptions  of  the  two  methods, 
let  us  disousB  them  in  more  detail.  For  programs  of  the  first  type, 
various  combinations  of  the  different  shell  lots  and  the  different  charge 
lots  are  made  Into  complete  rounds  as  defined  previously.  Included  among 
the  different  shell  lets  is  the  reference  shell  lot  and  included  among  the 
different  propellant  or  charge  lots  is  the  reference  propellant  lot.  These 
referenoe  lots  enable  us  to  tie  in  the  results  from  this  test  with  those 
from  previous  or  future  test.  They  serve  as  a  control  lot  and  theoreti¬ 
cally  take  out  any  day-to-day  or  ocoasion-to-occasion  effects.  Getting 
back  to  the  design,  two  complete  rounds  from  each  of  the  possible  combine- 
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tions  of  charge  lots  and  shell  lots— -by  this  I  mean  each  propellant  lot  is 
combined  with  every  one  of  the  shell  lots  and  vice  versa— are  fired  in 
pairs  as  a  two  factor  experiment.  Liagrammatically  the  design  for,  say, 
four  test  lots  of  shell  and  four  test  lots  of  propellant  looks  something 
like  this* 


\  Shell 

\^lot 
Propellant's, 

-  Lot  \ 

Ref. 

Shell 

Lot 

1 

Lot 

2 

Lob 

3  _ 

Lot 

_ k  _ 

Referenoe 

Propellant 

la,  lb 

6a,  6b 

11a,  lib 

16a,  16b 

21a, 

21b 

Lot  A 

22a,  22b 

2a,  2b 

7a,  7b 

12a,  12b 

17a, 

17b 

Lot  B 

18a,  18b 

23a,  23b 

3a,  3b 

8a,  8b 

13a, 

13b 

Lot  C 

14a,  14b 

194,  19b 

24a,  24b 

4a,  4b 

9a, 

9b 

Lot  D 

,10a.  10b 

15a,  15b 

20a,  20b 

25a,  25b 

_ 

The  number  shown  in  the  cells  refer  to  the  sample  round  number* 

For  example,  sample  rounds  la  and  lb  oonsist  of  the  refesrenoe  shell  and 
referenoe  propellant,  sample  rounds  2a  and  2b  oonsist  of  shell  from  lot  1 
and  propellant  from  lot  A,  etc.  Regarding  the  order  of  fire,  the  first 
group  of  ten  rounds  (Nos.  la  thru  5b)  are  fired  first  followed  by  the 
seoond  group  of  ten  rounds  (Nos.  6a  thru  10b),  etc.  until  all  five  groups 
of  ten  rounds  are  fired.  Within  each  group  of  ten  rounds,  however,  the 
sets  of  two  samples  are  fired  in  a  random  order*  For  example,  the  first 
group  of  ten  rounds  could  be  fired  as  follows t  3a.  3b.  5a,  5b.  2a.  2b. 
la,  lb,  4a,  4b.  ••999 


At  first  it  was  intended  to  fire  the  program  as  a  Latin  Square*  As 
you  can  see,  shell  lots,  propellant  lots,  and  order  of  fire  would  be  the 
three  factors.  However,  the  order  of  fire  for  any  groups  of  pairs  waB 
randomized  thus  destroying  one  of  the  underlying  conditions  of  the  Latin 
Square  design— -that  each  treatment  occurs  once  and  only  once  in  each  row 
and  each  column.  This  was  done  in  order  to  preolude  any  possibility  of 
a  memory  effect  that  may  come  about  from  an  ordered  design*  To  digress 
once  again  by  memory  effect  is  meant  the  effect  on  lot  B  due  to  the  fact 
that  it  always  follows  lot  A  in  the  firing  sequence.  These  memory  effects, 
which  usually  invalidate  the  data  for  a  program,  are  constant  hazards  In 
any  ballistic  test  since  they  may  be  caused  by  any  number  of  seemingly 
unimportant  factors,  for  example,  Email  changes  in  the  chemical  composition 
or  web  size  of  the  propellant.  Two  classic  examples  of  suoh  memory  effeots 
occurred  in  the  90mm  pun.  One  case  occured  during  World  War  II  and  was 
caused  by  the  addition  to  the  propellant  of  a  small  amount  of  potassium 
sulfate  which  had  been  added  to  suppress  flash.  The  effect  of  this  small 
change  was  that  when  a  sulfated  propellant  and  a  non-eulfated  propellant 
were  fired  alternately  in  a  relatively  new  tube  the  non- sulfated  rounds 
were  depressed  from  the  normal  by  about  20  f/s  in  velocity  whereas  the 
sulfated  rounds  fired  correspondingly  higher.  Since  propellants  are 
accessed  in  this  manner,  i.e.,  alternately  firing  the  test  propellant  and 
the  standard  propellant,  the  asset  ament  or  the  charge  weight  that  will 
enable  the  propellant  to  fire  the  required  or  service  velocity  of  many 
90mm  non-sulfate d  propellant  lots  was  in  error  by  about  40f/e  due  to  the 
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fact  they  were  assessed  against  a  sulfated  reference  propellant*  The 
second  case  occurred  during  the  Korean  War  and  was  very  similar  in  nature* 

It  involved  a  10$  change  in  the  web  size  of  the  propellant.  The  web  size 
of  the  test  propellants  was  increased  by  10$  whereas  the  web  size  of  the 
standard  propellant  was  hot  changed.  This  too  resulted  in  approximately 
a  UOf/s  error  in  velocity  for  the  test  propellants.  In  case  you’re  interested 
both  situations  were  remedied  quickly  by  standardizing  a  new  reference 
propellant  which  hah  the  same  physical  properties  ae  the  test  propellant e 
being  produced. 

Now  getting  back  to  our  discussion,  for  programs  of  the  second  type, 
the  different  propellant  lots  are  assembled  into  complete  rounds  with  the 
reference  shell  lot  when  propellant  lots  are  being  tested  and  the  different 
shell  lots  are  assembled  into  complete  rounds  with  the  reference  propellant 
lot  when  shell  lots  are  being  tested.  These  complete  round, lots  are  then 
fired  in  a  series  of  fivs  round  groups  in  a  manner  determined  by  the  number 
of  lots  being  tested*  For  example,  if  three  test  lots  are  being  tested  the 
firing  sequence  would  be  reference  lot,  test  lot  1,  test  lot  2,  teet  lot  3* 
referenoe  lot}  if  four  teste  lots  are  being  tested  the  firing  sequence  would 
be  the  same  as  that  above  exoept  that  four  groups  of  test  lota  would  be 
fired  between  the  referenoe  groups;  if  six  test  lots  are  being  tested  the 
firing  sequence  would  be  referenoe  lot,  test  lcbl,  test  lot  2,  test  lot  3, 
referenoe  lot,  test  lot  k,  test  lot  5,  test  lot  6,  referenoe  lot}  eto. 

In  eaoh  of  these  oases  the  sequenoes  would  bs  firsd  a  second  time  in  order 
that  ten  rounds  from  eaoh  test  left  would  be  fired. 

In  firing  eaoh  of  these  designs  oert&ln  other  oontrol  mechanisms  are 
used  in  order  to  minimize  any  extraneous  effeots  that  would  bias  the  results. 
These  mechanisms  Include  the  use  of  only  one  gun  tube  throughout  the  pro¬ 
gram,  storing  the  ammunition  at  a  constant  temperature  of  7CPF  for  approxi¬ 
mately  2U  hours  prior  to  firing,  firing  conditioning,  rounds  of  the  same 
type  and  oompositlon  as  the  test  rounds  before  any  of  the  test  rounds  are 
fired  in  order  to  get  the  gun  tube  in  the  proper  frame  of  mind  eo  to  apeak, 
using  the  earns  lot  of  fuzes  and  the  earns  lot  of  primers  throughout  the 
program,  and  firing  any  one  phase  of  the  program  on  one  day  without  cessa¬ 
tion  or  any  undue  delay. 

With  this  description  of  the  practices  and  prooedurec  Involved  In  the 
ballistic  testing  of  ammunition  you  have  become  acquainted  with  two  methods 
of  testing  separate  loading  ammunition--that  method  which  we  ehall  call 
Method  1  where  test  charge  lots  and  test  shell  lots  are  fired  in  various 
combinations  in  the  same  program  as  a  two  faotor  experiment  and  that  pro¬ 
gram  which  we  shall  call  Method  2  where  the  teet  shell  lots  are  assembled 
with  the  referenoe  propellant  lot  and  fired  in  one  program  and  the  test 
propellant  lots  are  assembled  with  the  reference  shell  lot  and  fired  in 
another.  The  first  method  better  simulates  field  firing  conditions  and  is 
more  economical  whereas  the  second  method  is  more  easily  accomplished  and 
gives  results  better  suited  for  surveillance  purposes. 


In  order  that  we  may  make  a  conparison  of  the  two  methods  of  test  a 
program  has  been  fired  involving  four  test  lots  of  MljAl  155mm  Howitzer 
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propelling  charges  and  four  tett  lots  of  HE  M10?  155mm  Howitzer  shell. 

Ten  rounds  from  each  of  the  test  lots  were  fired  in  each  of  the  three  zones, 
HI,  V,  and  VII.  In  firing  Method  2,  however,  only  that  phase  involving 
the  firing  of  the  test  propelling  charge  lots  with  the  reference  shell  lot 
was  conducted.  For  this  reason  then  only  the  characteristic  muzzle  velooity 
is  considered  in  making  the  comparison.  Comparing  the  results  of  the  two 
methods  after  analyzing  the  data  from  each  we  have  the  following* 

CHARGE  III 

Avg.  Vel 

Rd-to-Rd  Std.  Dev. 
Lot-to-Iot  Std.  Dev. 

Method  1 

873. Of/s 

7.98  f/s 

6.71  f/s 

Method  2 
tJ6$.0f/s 

5*20  f/s 

5.03  f/s 

Avg.  Vel. 

Rd-to-Rd  Std.  Dev. 
Lot-to^Lot  Std*  Dev. 

CHARGE  V 

itsjts'i 7« 

U.U7  f/s 

2.87  f/s 

1210.5  f/b 

1**00  f/s 

1.65  f/e 

Avg.  Vel. 

Rd-to-Rd  Std.  Dev. 
Lot-to-Lot  Std.  Dev. 

CHARGE  VII 

OTX27T 

U-io  f/s 

1.11  f/s 

1814*. 8  f/s 

3.02  f/a 

2.11  f/s 

In  each  of  the  charges  it  is  observed  that  the  average  velooity  of  the 
lots  obtained  from  the  first  method  is  larger  than  the  average  velocity  of 
the  lots  fjrom  the  second  method.  In  faot  in  each  oase  the  average  velooity 
obtained  using  method  one  is.  signifioantly  greater.  It  is  likewise  observed 
that  the  round- to -round  standard  deviation  in  velooity  obtained  from  the 
first  method  is  greater  than  that  obtained  from  the  seoond  method.  In  this 
oase,  however,  only  in  Charges  III  and  VII  is  the  round-to-round  standard 
deviation  obtained  from  the  first  method  signifioantly  greatsr.  In  no  oase 
are  the  lot»to-lot  standard  deviations  signifioantly  different. 

Having  observed  these  results  the  question  comes  to  mind  why  are  the 
results  from  the  two  methods  different?  Just  why  should  method  one  give 
larger  round-to-round  dispersions  than  those  of  the  more  commonly  used  second 
method?  In  an  attempt  to  answer  this  question  we  will  further  analyze  the 
first  method  since  by  the  nature  of  its  design,  af.  opposed  to  the  simplicity 
of  the  second  method,  it  more  readily  lends  itself  to  extensive  analysia. 
Analysing  it  first  at  a  two-way  cla seif i cation  with  two  observations  per 
cell  it  was  observed  that  in  all  three  chargee  there  was  a  highly  signifi¬ 
cant  shell  and  propellant  interaction  effect.  Ibis  was  rather  surprising 
since  the  test  had  been  designed  under  the  supposition  that  any  euoh  effect 
would  be  negligible.  To  investigate  the  possible  causes  of  this  interaction 
effect  and  also  possibly  throw  come  light  on  the  differences  in  the  results 
for  the  two  methods,  ice  made  several  corrections  to  the  data.  These  cor¬ 
rections  were  made  to  account  for  known  differences  between  the  two  methods. 
The  first  correction  made  was  that  for  differences  in  shell  weights.  It's 
remembered  that  in  the  second  method  reference  shell  all  loaded  to  the  pre¬ 
scribed  standard  weight  are  used  whereas  in  the  first  method  test  shells 
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loaded  to  various  weights  are  used®  Thus  correcting  each  velocity  for  the 
variation  of  the  shell  weight  from  the  standard  weight  would  take  out  any 
effect  due  to  shell  weight.  Making  this  correction  we  found,  as  expected, 
had  no  significant  effect,  in  fact  hardly  any  effect  at  all,  on  the  results 
of  the  two-way  classification.  The  second  correction  was  that  for  velocity 
trend.  As  more  rounds  are  fired  from  a  tube  the  velocity  level  of  the  tube 
usually  becomes  lower.  This  is  generally  more  true  of  high  velocity  weapons 
and  is  not  considered  of  too  great  importance  when  firing  the  smaller  caliber 
howitzers,  especially  when  firing  only  a  fifty  round  group.  However,  since 
we  ore  interested  in  investigating  all  the  possibilities,  we  estimated  the 
velocity  trend  using  the  analysis  of  covariance  and  then  removed  any  tiond 
found  from  the  data.  , Doing  this  reduced  the  interaction  effect  in  each 
case  and  in  some  cases  even  made  it  insignificant*  Based  on  this  result 
then  the  velocity  trend  evidently  did  cause  some  of  the  interaction.  How¬ 
ever,  neither  it  nor  the  shell  weight  correction  had  any  effect  on  the 
round-to-round  standard  deviation  and  very  little  effect  on  the  average 
velocity. 

Thus  in  view  of  those  results  no  light  can  be  shed  as  to  the  reasons 
for  the  larger  dispersions  and  higher  velocities  of  the  first  method  other 
than  that  of  the  difference  in  the  experimental  errors  in  the  two  test 
procedures.  Therefore,  unless  some  physical  means  of  evaluating  the 
magnitude  of  this  difference  ie  obtained,  the  only  way  the  first  method 
can  be  used  in  order  to  assign  grades  to  the  individual  lots  without 
unnecessarily  penalizing  them  is  to  have  the  Lot  Quality  Standards  and 
Criteria  take  into  account  such  inoreapes  and  be  based  upon  experimental 
data  from  tests  of  the  first  type.  In  this  way  then  the  more  economical 
first  method  could  be  used  and  individual  lot  grades  could  still  be, assigned. 

To  summarize,  having  given  you  a  brief  description  into  the  difference 
between  separate  loading  and  fixed  and  seml-fixed  ammunition  and  also  having 
given  you  the  main  purposes  of  surveillance  testing,  that  of  grading  in¬ 
dividual  lots  and  providing  over-all  estimates  of  dispersion  for  different 
types  of  ammunition,  you  were  made  aware  of  the  problem  involved  in  surveil¬ 
lance  testing  separate  loading  ammunition— how  to  economically  and  realisti¬ 
cally  test  separate  loading  ammunition  and  still  get  results  that  may  be 
used  to  achieve  the  purposes  of  surveillance  testing.  To  accomplish  thle, 
beoausa  of  the  many  extraneous  factors  that  may  affect  ballistic  teats, 
the  use  of  designed  programs  and  reference  lots  had  to  be  used.  Two  such 
kinds  of  programs  were  given*  program  or  method  one  involved  firing  test 
propelling  charge  lots  and  test  shell  lots  in  the  same  design,  whereas  pro¬ 
gram  or  method  two  involved  firing  the  test  propelling  charges  lots  with 
the  reference  shell  in  me  phase  and  the  test  shell  lots  with  the  reference 
propellant  in  the  other.  Programs  of  the  first  type  were  more  economical 
and  more  nearly  characterized  the  manner  in  which  separate  loading  ammuni¬ 
tion  was  fired  in  the  field)  programs  of  the  second  type  gave  results  which 
were  better  suited  for  grading  individual  lots.  The  results  from  a  program 
oonparing  the  two  methods  were  given.  These  results  showed  that  programs 
of  the  first  type  gave  in  moot  carer  significantly  larger  round-to-round 
standard  deviations  and  significantly  greater  average  velocities.  No 
explanation  for  these  increases  war  found  although  velocity  trend  appeared 
to  play  a  significant  role  with  respect  to  the  interaction.  Therefore. 
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based  on  the  findings  of  the  special  program,  it  was  concluded  that  the 
only  way  in  wh:'  ch  the  more  economical  and  realistic  first  method  could  be 
used  in  order  to  assign  grades  to  the  individual  lots  without  unnecessarily 
penalizing  them  was  to  have  the  Lot  Quality  Standards  and  Criteria  take 
into  account  such  increases  in  experimental  error  and  be  based  upon  experl** 
mental  data  from  tests  of  that  type* 
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A  STATISTICAL  DESIGN  FOR  A  SURVEILLANCE  TEST 


Boyd  Harshbarger* 

Redstone  Arsenal  and  Virginia  Polytechnic  Institute 


An  example  may  serve  to  show  how  the  problem  of  surveillance  can  be 
attacked  through  statistical  design.  We  will  discuss  a  portion  of  a  well- 
designed  experiment  carried  out  by  the  Rooket  Development  Group  at  the 
Redstone  Arsenal.  The  variable  concerning  us  in  this  talk  is  the  time  to 
spontaneous  ignition  in  the  sample  tested.  This  was  one  of  several  vari¬ 
ables  measured  in  the  study.  The  other  variables  were  strand  burning  rate 
and  X-ray  diffrac tome trie  analysis  of  oxidizers  on  the  surfaoe  of  the 
sample.  A  study  was  made  on  tho  sizes  of  the  variances  and  ineans  before 
and  after  running  oach  test.  This  study  served  to  detect  a  shift  in  the 
means  as  well  as  to  measure  variability  due  to  the  techniques!  equipment! 
and  personnel. 


The  observations  follow  the  usual  linear  model! 
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where  fx  is  the  overall  mean,  a.  la  the  added  effect  of  the  1U>  sample , 
y  is  the  added  effeot  of  the  Jk  weak,  8.  is  the  added  effeot  of  the  ktb 
environment,  (ay). j  is  the  added  effeot  Sf  the  lnteraotion  of  the  ltt  sample 
and  the  jfc  week,  ,  and  the  Cj^  are  random  errors,  independently  and 
normally  distributed  with  zero  means  and  common  variance  a2,  The  Important 
things  to  observe  here  are  that  we  are  dealing  with  fixed  or  named  effects, 
that  our  modal  is  a  linear  one  and  that  the  model  includes  a  factorial*  In 
a  factorial  experiment,  the  of foots  of  a  number  of  different  faotors  as 
well  as  their  Independence  are  Investigated  simultaneously.  In  reality 
the  environments  are  further  separated  into  helium  and  oxygen  and  eaoh  at 
two  different  temperatures.  All  this  modification  does  tb  tho  model  is  to 
add  several  terms. 


It  is  easily  shown  that  the  least  square  solution  of  the  linear  model 
gives  estimates  of  the  various  ef foots  and  also  provides  the  basis  for  an 
analysis  of  variance,  This  analysis  of  variance  provides  a  test  of  algnl- 
figanoe  in  which  one  oompares  the  random  error  with  the  treatment  and 
interaction  effeoto. 


The  chemists  see  the  objectives  of  tho  experiment  as: 

(a)  To  compare  the  behavior  of  the  basic  samples  designated  as  D  and 
U  over  a  poriod  of  time, 

(b)  To  establish  tho  woek-to-week  trend,  if  it  exists. 

(o)  To  compare  the  effects  of  two  different  testa  of  environments, 
helium  and  oxygen. 

(d)  To  investigate  the  effects  of  temperature. 


*  The  author  acknowledges  tho  help  of  Lt.  L.  hombara  and  the  supplying 
of  the  data  by  Mr.  R.  L.  Rudolph,  both  of  Redstone  Arsenal. 
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(e)  To  study  the  interaction  or  independence  of  the  main  effects. 


The  data  that  were  gathered  to  answer  these  questions  are  given  in 
Table  Z. 


The  statistic ian  attempts  to  show  a  mathematical  model  and  analyses 
which  will  enable  the  ohemist  to  answer  his  questions  oh  a  probability 
basis*  In  general,  this  involves  the  setting  up  of  ,a  number  of  so-called 
null  hypotheses,  which  may  or  may  not  be  rejected* 

Table  I  gives  the  time  in  seconds  to  spontaneous  ignition  for  the 

samples  tested* 

TABU  I 

Time  to  Spontaneous  Ignition  for  the  Tested  Samples 
U  D 


ENVIRONMENTS _  ,  ENVIRONMENTS 


Helium  Oxygen  Helium  Oxygen 


fijz £ 

1 U&L 

9JSL. 

a  70° 

b  120°  c  70° 

d  l120° 

89.1 

89.4 

89.7 

75.2 

AFTER 

86.6 

94.5 

85.9 

73.4 

ONE 

89.9 

93.7 

86.8 

74.5 

88.8 

92*8 

86.2 

84.7 

WEEK 

85.5 

90.0 

91.0 

77.5 

94.1 

92.2 

■  92.8 

83.6 

I 

87.7 

H7V8 

87.8 

91.1 

84.4 

873 

72.8 

TV? 

90.0 

9170 

92.3 

92?4 

.90.4 
•  89.0 

79.8 

a277 

86.7 

92.1 

85.3 

76.6 

AFTER 

89.4 

88.9 

84.3 

73.6 

TWO 

84.5 

90.4 

80.6 

74.4 

WEEKS 

87.7 

89.0 

78.3 

75.4 

X 

87.1 

87a 

22x2 

90.2 

ZM 

8177 

72.5 

74l5 

96.4 

.83.3 

81.6 

94.1 

85.7 

81.1 

98.4 

9&3 

824 

8!3 

79.4 

853 

90.2 

86.9 

83.4 

76.2 

AFTER 

88.7 

84.3 

82.3 

71.8 

THREE 

86.7 

92.9 

82.1 

67.7 

WEEKS 

87.7 

87.8 

80,7 

70.9 

1 

88,1 

80.3 

85.1 

8?74 

83.0 

8S3 

77.4 

TO 

91,8  97*3  90.0  84.8 
94.3  96.5  86.1  79.0 


90.2 

81.6 

79.7 

67.3 

AFTER 

90.5 

89.1 

77.2 

65.5 

FOUR 

88,1 

87.4 

78.9 

66,1 

WEEKS 

87.3 

86.9 

82.9 

62.9 

87,2 

86.1 

74.6 

67.3 

95.5 

94.2 

91.4 


93.7 


91.1 

91.1 


86.5  71.6 

81.9  73.1 

84.9  69.4 
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The  usual  calculations  on  the  data  from  Table  I  are  now  made  to  give 
the  analysis  of  variance.  Under  the  column,  "source  of  variation,"  are 
shown  the  several  types  of  variation  and  opposite  these  names,  under  the 
oolumn  headed  "moan  square,"  are  given  comparable  estimates  of  these  vari¬ 
ations.  The  quantity  opposite  "error"  under  the  mean  square  is  an  estimate 
of  random  variation.  Comparison  between  the  "error"  and  the  other  mean 
squares  is  used  to  produce  a  test  of  significance.  Table  II  gives  the 
analysis  of  variance.. 

TABLE  II 

Analysis  of  Variance  of  Time  to  Spontaneous  Ignition 


Source  of  Variation  ^‘dof 


U  vs  D 
Weeks : 

Linear 

Quadratic 

Cubic 

Environments  s 
Temp ' s 
He  vs  C>2 

Temp's  x  He  vs  C>2 
U  vs  D  x  Weeks 
U  vs  D  x  Environments 
Week3  x  Environments 
Weeks  x  Temps 
Weeks  x  He  vs  02 
Weeks  x  He  vs  x  Temp 
U  vs  D  x  Weeks  x  Envrs 
Error 
TOTAL 


Sum  of 
Squares 

817.20 

388.57 


Mean  Square 

817.20* 

129.52* 


5116.17 


34.53 

58.ll 

444.96 


325.76 

325.76* 

27.57 

27.57* 

35.25 

35.25* 

t 

1705.39 

504.83 

504.83* 

3644.45 

3644,45* 

966.89 

966.89* 

1 

11.51 

19.37 

> 

49,44* 

172.30 

57.43 

256.80 

85.60 

15.86 

5.29 

64.94 

534,92 

7459.40 
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Means  are  presented  in  Tables  III,  IV,  and  V.  Table  1^  gives  some 
indication  as  to  the  significant  trends  of  these  tables  and  also  indicates 
which  trends  can  be  dismissed  as  purely  random  variation* 

Time  in  weeks  seems  to  affect  both  samples,  V  and  D,  in  the  same  manner 
The  environment,  however,  shows  that  they  vary  from  week  to  week  In  a 
different  manner  for  the  separate  oonditions  a,  b,  o,  and  d»  Temperature  , 
affects  the  time  to  spontaneous  ignition  differently  in  helium  than  in  air* 

The  week-to-week  variations  show  a  linssr  trend  but  not  sufficiently 
that  the  remaining  variation  is  non-significant.  The  two  samples,  U  and  D, 
gave  different  times  to  spontaneous  oonbustion.  By  looking  at  the  analysis 
of  variance  table,  one  oan  see  other  variations  that  are  signifioant* 

TABLE  III 


U  vs  D 


JL  JEL  JE  J3L  Mi 


U  85*3  83,4  82.?  79.8  82.8 


Avg  86*7  85.2  85*2  81*9 


TABLE  IV 


Environments 


Avg  89.3 
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TABLE  V 
Environments 


a 

b 

0 

d 

Avg 

u  87.9 

88.7 

82.6 

72.0 

82.8 

U  vs  D 

D  91.6 

94.4 

86.8 

79.2 

88.0 

Avg  89.3 

90.8 

84.1 

74.4 

By  extending  the  analysis  of  Table  II,  some  revealing  facts  oan  be  shown 
as  indioated  in  Table  VI* 

TABLE  VI 


Source  d.f.  S3  MS  F 


U  va  p 
Environments 
Temp  with  He 
Temp  with  Oxy 
Gases 

(Helium  vs  Oxygen) 
Weeks  within  a 


1  817.19 

3  5116.17 

1  37.21 

1  1434*51 

1  3644.45 


817.19 

1705.39 

37.21 

1434,51 

3644,45 


146.71** 

306.17** 

6.68* 

257.54** 

654.30** 


3  18.68 


6.23  1.12 


Linear 

Quadratic 

Residual 


l 

12.38 

12,38 

2.22 

1 

6.04 

6.04 

1.08 

1 

•  26 

•26 

Weeks  within  b 
Linear 
Quadratic 


3 

1 

1 


75.08 

50.06 

24.32 

.69 


25.03 

50.06 

24.32 

.69 


4,49** 

8.99** 

4.37* 


Residual 


1 
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Souroe  d 


Weeks  within  o  3 

Linear 
Quadratic 
Residual 

Weeks  within  d  3 

Linear 
Quadratic 
Residual 

U  vs  D  x  Weeks  3 

U  vs  D  x  Envr  3 

a 

II  vs  D  x  Week  x  Envr  9 


In  Table  VI,  the  variation  is  separated  so  as  to  show  separately  the 
variation  of  weeks  in  the  four  different  environments*  Weeks  within  envir¬ 
onment  (a)  is  not  significant,  but  when  heat  is  applied  to  produce  environ¬ 
ment  (b),  a  variation  between  weeks  is  noted.  Weeks  within  environment 
(c),  which  is  at  ambient  temperature  and  in  oxygen,  is  greater  than  the 
variation  between  weeks  within  environment  (b)  but  is  still  lesB  than  the 
variation  noted  for  between  weeks  within  environment  (d)  which  is  at  the 
higher  temperature*  The  pattern  for  this  analysis  of  variance  shown  in 
Table  VI  is  useful  in  many  factorial  experiments* 

The  analysis  of  variance  was  run  on  the  logarithms  of  the  estimated 
variances  (a*)  oaloulated  from  the  within  sample  variations  for  both  sample 
U  and  sample  D.  There  was  no  signifioanoe  noted  in  either  analysis  variance 
of  variances. 

There  may  be  some  objection  to  considering  the  mean  square  with  ninety- 
six  degrees  of  freedom  as  an  experimental  error  in  as  muoh  as  it  has  many 
characteristics  of  a  sampling  error*  A  more  realistic  experimental  error 
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may  be  obtained  by  using  the  first  and  second  interaction  terms.  This 
error  would  involve  the  interaction  of  weeks,  environments,  and  the  second 
order  interaction  of  weeks  and  environments  with  the  differences  between  the 
samples.  It  appears  reasonable  to  assume  that  the  interaction  of  weeks 
and  environments  with  the  differences  between  samples  will  bs  &  random 
variable  and  thus  given  an  estimate  of  true  error.  For  Table  VI  the  error 
term  would  be  10,50  with  nine  degrees  of  freedom.  A  chemist  is  primarily 
interested  in  the  types  of  curves  and  the  estimate  of  residuals  from  these 
curves.  It  can  be  seen  that  in  the  environment  with  helium,  linear  and 
quadratic  trends  account  for  most  of  the  variation.  In  oxygen  there  is  a 
different  picture,  as  no  specific  trend  appears  and  the  significant  varia¬ 
tion  between  weeks  is  accounted  for  by  the  results  for  the  last  week. 
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MONTE  CARLO  AND  OPERATIONAL  GAMING  IN  ORDNANCE  RESEARCH 


L..  M..  Court 

Diamond  Ordnance  Fuze  Laboratories 

It  has  been  said  that  the  proper  role  of  a  meeting  chairman  is  to 
serve  the  needs  of  his  audlenoej  he  should  not  obtrude  on  the  speakers 
or  the  disoussion  but  confine  himself  to  listening.  Briefly,  he  is  a 
sort  of  program  traffic  firector.  If  this  is  the  oase,  then  In  submitting 
this  "post  mortem"  comment  on  what  went  on  at  one  of  the  sessions,  the 
writer  is  sinning  against  the  oode  of  good  conduct  for  chairmen.  His 
only  excuse  is  the ■ importance  of  the  topics  to  be  touched  on»  Monte 
Carlo  and  Operational  Gaming.  This  and  the  fact  that  operational  gaming 
is  the  heart  and  substance  of  the  first  of  the  three  papers  presented 
under  his  chairmanship,  and  the  fact  that  Monte  Carlo  is  the  technique 
that  resolves  the  central  problem  of  another  paper. 

Both  Monte  Carlo  and  operational  gaming  have  burgeoned  in  the  era 
since  Von  Neuman  and  Morgenstem  wrote  their  classic  on  the  theory  of 
games  and  the  modern  high  speed  electronic  computer  became  a  praotloal 
operating  devioeji  indeed,  Von  Neuman  himself,  in  company  with  another 
mathematiolan,  Ulam,  is  responsible  for  the  Monte  Carlo  idea  in  its 
modern  version,  as  it  is  currently  being  exploited  by  physicists  and 
operations  research  analysts,  although  the  ancestry  of  the  idea  oan  be 
traoed  baok  at  the  very  least  to  the  time  of  Buff on  and  his  celebrated 
needle  problem.  Allowing  for  the  brief  decade  or  so  that  Monte  Carlo 
has  been  pursued,  a  not  inoonsiderable  literature  has  grown  up  about  it, 
although  the  bulk  of  the  published  material  busies  itself  with  aotual 
examples  rather  than  broad  theory.  Would-be  enthusiasts,  who  recognise 
the  power  of  the  method  but  are  otherwise  uninitiated,  justly  oomplain 
that  a  satisfactory  introduction  is  hard  to  oome  by.  Ihe  truth  is  that 
Monte  Carlo  is  in  its  sheerest  infancy,  and  many  problems  remain  to  be 
resolved)  e.g.,  what  is  the  full  gamut  of  mathematical  and  physleal 
phenomena  that,  although  not  intrinsically  stochastio,  or  at  first  sight 
so,  are  somehow  reducible  to  this  form?  We  know  that  Laplaoe'a  equation 
oan  be  approximated  to  by  a  linear  difference  equation  representing  a 
random  walk  problem  in  which  the  probability  that  the  particle  will  move 
from  any  grid  point  to  any  of  the  six  neighboring  grid  points  is  the 
same,  rendering  the  equation  amenable  to  the  Monte  Carlo  treatment) 
also  that  Fermi  suggested  long  ago  (as  measured  in  "Monte  Carlo  era" 
time  units)  that  this  teohnique  be  applied  to  the  wave  equation,  which 
is  essentially  a  modified  Laplaoe  equation.  But  does  every  differential 
equation  have  to  be  linear  if  it  is  to  submit  to  the  Monte  Carlo  teohnique, 
etc.?  Ihe  question  we  have  posed  is  a  sweeping  one.  The  truth  is,  once 
again,  that  Monte  Carlo  is  so  young  that  any  innovation  is  to  be  valued, 
even  when  it  is  not  strictly  new  but  merely  "sees"  already  familiar  matters 
in  a  freBh  light. 

A  great  virtue  of  Monte  Carlo..,  apparent  to  its  innovators,  Von  Neuman 
and  Ulam,  is  that  it  provides  a  means  for  subduing  complex  problems 
(including  some  whose  formal  mathematical  solution  has  been  accomplished) 
that  perplex  us  on  the  practical,  application  level  because  if  traditional 
"hand"  methods  of  computation  are  applied  to  them,  numerical  results  are 
unconscionably  slow  in  forthcoming.  Monte  Carlo  is  thus  a  scheme  for 
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bringing  the  enormous  power  of  the  electronic  computer  to  bear  on  problems. 
There  are  other  such  schemes.  A  given  machine  hap  certain  potentialities 
for  managing  problems,  these  being  determined  by  the  available  schemes  for 
"laying  out"  problems  and  the  engineering  of  the  machine,  and  by  the  two 
in  conjunction.  Ihe  simpler  of  these  schemes  were  probably  in  the  mind  of 
the  machine's  designer  when  he  was  diagramming  its  circuitry,  but  whenever 
a  new  scheme  is  invented,  it  may  enhance  the  potentialities  of  existing 
machines  as  well  as  those  yet  to  be  constructed.  Monte  Carlo  furnishes  a 
grand  strategy  for  attacking  a  certain  species  of  problems,  the  processes 
built  into  the  machine  being  the  taotics  for  realising  the  strategy. 

Viewed  in  this  fashion,  Monte  Carlo  is  a  more  generalised  form  of  coding, 
more  powerful  than  orthodox  computer  ooding  beoause,  in  the  sequenoe  of 
devioes  leading  from  a  problem's  formulation  to  its  praotleal  solution,  it 
comes  earlier,  and  the  "leverage"  a  device  provides  is  roughly  proportional 
to  its  priority  of  application 

Another  virtue  of  Monte  C...'lo  is  its  ability  to  pass  in  review  before 
our  eyes  a  vast  variety  of  configurations  emanating  from  a  manifold  process) 
configurations  which,  beoause  of  their  diversity  and  numerousness,  would 
take  us  years  to  experience  in  the  real-life  setting  of  the  process.  It 
Is  this  property  of  the  method  that  Professor  Morse  of  M.I.T.  values  so 
highly  for  use  In  Operations  Research.  It  is  the  amassing  rapidity  of  the 
electronic  computer  that  makes  this  a  practical  possibility. 


To  summon  up  the  configurations,  the  process  is  analysed  into  its 
elements,  out  of  which  the  intrinsic  ones*  those  from  which  the  prooess 
can  be  reproduced  without  doing  violenoe  to  its  nature,  are  singled  out 
for  consideration.  As  a  matter  of  praotical  computation,  the  number  of 
these  intrinsic  elements  whould  not  be  excessive,  since  they  join  by 
combination  to  produoe  the  configurations,  and  we  know  that  in  the  com¬ 
binatorial  arithmetic  applying  to  suoh  situations*  numbers  mount  very 
rapidly  for  small  changes  in  the  values  of  the  controlling  variables. 

Thus  even  if  the  number  of  distinct  "forms"  or  "manifestations"  that 
eaoh  element  oan  assume  is  only  two  and  there  are  n  intrinsic  elements, 
the  resulting  number  of  configurations  is  already  2n  (already  102ii  when 
n  *  10).  Aotually,  each  element  is,  as  a  rule,  oapable  of  many  more 
"manifestations",  often  a  continuous  (infinite)  array  of  thf.n,  and  there 
is  a  frequency  distribution  specifying  the  probabilities  wloh  whloh  they 
are  assumed.  In  the  usual  case  the  mode  of  combination  of  the  intrinsic 
elements  to  form  the  configurations  is  interdependent,  so  that  these 
univariate  distributions  (striotly,  their  random  variables)  are  not 
statistically  independent,  but  conditional  probability  is  always 
troublesome  to  work  with,  and  as  a  practioal  measure  we  oan  overlook  this 
dependenoo  if  it  is  not  too  large. 

The  procedure  is  then  as  follows i  for  each  element  we  uee  an  inde¬ 
pendent  game  of  chance  (a  table  of  random  numbers,  if  you  will)  based  on 
the  element's  underlying  distribution  to  pick  the  particular  "manifestation" 
that  is  revealed  at  the  moment,  the  different  simultaneous  "manifestations" 
being  combined  to  give  a  particular  configuration.  By  continuing  to  "spin" 
our  roulette  wheel  or  game  of  chance,  the  great  variety  of  configurations 
will  sooner  or  later  come  up,  and  this  with  the  same  relative  frequencies 
that  they  would  be  menerated  by  the  process  in  real  life.  (Subjeot,  of 
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course,  to  the  approximations  we  have  allowed  —  the  substitution  of  a 
small  number  of  intrinsic  elements  for  the  totality  and  independent 
distributions  for  Interdependent  ones.)  In  practice  we  are  embarrassed 
by  the  richness  of  configurations  thrown  up  and  an  electronic  computer 
is  required  to  keep  track  of  them,,  If  acme  mode  or  average  of  the  con¬ 
figurations  is  required,  the  computer  can  obtain  it  for  us  while 
"auditing"  them. , 

The  origin?  of  operational  gaming  are  distinct  from  those  of  Monte 
Carlo,  Traditionally  our  military  establishments,  the  Army  and  the  Navy, 
have  conducted  war  games,  not  only  to  train. their  personnel  in  the  handling 
of  equipment  and  their  own  persons  under  oiroumstanoes  more  nearly  resembling 
the  conditions  encountered  in  combat,  but  also  to.  reexamine  for  the  benefit 
of  the  general  staff  old  methods  of  warfare  and  test  and, develop  fresh  tactics. 
It  is  the  method  of  using  a  "material"  model,  scaled  down  several  steps 
from  the  phenomenon  it  is  used  to  represent,  to  enable  the  human  mind  to 
work  out  ideas  that  are  too  complex  for  it  to  retain.  The  architect  uses 
it  when  he  makes  a  plaster  of  paris  model  of  the  capitol  or  museum  he  is 
designing.  On  a  more  active  level,  that  of  design  in  motion,  a  football 
coach  uses  it  when  he  puts  his  men  through  their  paces  in  the  field,  evolving 
a  new  attack  formation. 

Although  the  writer  did  not  consciously  intent  to  develop  the  point, 
there  is  considerable  identity  of  form  and  function  between  the  football 
situation  and  the  war  games  of  the  military.  As  he  sees  it,  the  most 
important  aspeot  of  operational  gaming  is  this  introduction  of  the  factor 
of  human  psychology,  particularly  as  it  operates  under  conditions  of  stress 
such  as  competition,  into  a  model  that  otherwise  represents  a  purely 
msohanioal  or  purely  natural  situation,  1.#.,  a  situation  In  whloh  the 
human  element  is  absent.  If  wo  are  to  rely  on  simulation  devioes,  under 
whioh  oatagory  operational  gaming  must  be  inoludsd,  there  seems  to  be  no 
other  way  of  introducing  the  human  element  than  by  the  use  of  human  parti¬ 
cipants.  A  theory  of  human  behavior,  especially  In  the  area  bearing  heavily 
on  the  problem  under  study,  could  be  employed  in  plaoe  of  aetive,  living 
human  beings*  but  then  whatever  might  be  true  of  other  levels,  there  would 
be  no  simulation  on  the  human  level. 

Ihe  first  of  the  papers  on  the  program  that  the  writer  chairmannad 
had  this  property  of  combining  mechanical  means  with  the  human  element  as 
provided  by  living  beings.  It  might  be  better,  in  order  to  bring  into 
relief  the  particular  interplay  of  the  human  and  mechanical  factors  in 
this  paper  ("Iho  Difference*  in  Experimental  Data"  by  A,  J.  Eckles,  III), 
reproduced  elsewhere  in  these  Proceedings,  not  to  talk  about  it  directly 
but  to  give  the  gist  of  a  telephone  conversation  the  writer  had  with  its 
author. 

The  problem  of  measuring  the  effectiveness  of  a  means  of  destruction, 
whioh  for  simpler  weapons  is  reduced  to  that  of  calculating  a  hit  proba¬ 
bility,  is  an  old  one.  If  a  new  rifle  was  invented,  or  a  new  type  of  bullet, 
the  "classical"  method  to  ascertain  its  hit  probability  was  to  set  up  a 
stationary  mount  or  screen  and  have  it  shot  at  from  a  firing  line  a  fixed 
distance  removed.  The  number  rvf  hits  would  then  determine  the  hit  probability. 
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Little  or  no  attention  was  paid  to  the  circumstance  that  several  neighboring 
holes  in  the  screen  might  represent  the  injury  or  death  of  the  same  soldier* 
and  the  problem  of  overkilling  was  thus  largely  neglected. 


A  more  grievous  error,  fundamental  in  charaoter,  was  to  assume  that  an 
estimate  of  the  rifle's  damage  oapabilities  under  the  static,  unruffled 
conditions  of  a  firing  range  oould  be  equated  to  its  power  to  damage  on  a 
battlefield.  One  could  make  theoretical  allowances  for  the  kaleidosooplo 
changeability  of  the  battlefield  and  the  impact  on  the  infantryman's 
nerves  of  the  bustle  and  fire,  a  sort  of  theoretioal  simulation,  but  might 
it  not  be  more  aoourate  to  introduct  these  factors  deliberately  into  the 
model,  to  a  degree  compatible  with  safety  considerations,  in  the  form  of 
mobile  human  participants,  moving  target  representations,  etc.?  Simple 
meohanioal  factors  can  continue  to  be  correc ted  without  simulation j  e.g., 
one  oan  qualitatively  decide  that  a  boat-tail  bulled,  which  was  superior  to 
a  flat-base  projectile  on  the  testing  grounds  because  of  the  extra  1000 
yards  of  range  it  gave,  was  nevertheless  inferior  in  actual  battle  where 
the  variety  of  obstacles  reduces  the  importance  of  range,  and  it  is  in¬ 
commodious  to  replace  the  rifle  barrels  that  are  constantly  being  worn  out 
by  the  heavier  bullet. 


A  first  approximation,  still  quite  crude,  to  the  realism  of  the 
battlefield,  suggested  by  this  line  of  thought,  is  to  substitute  irregularly 
moving  mounts  for  the  stationary  ones  that  are  ordinarily  used  to  determine 
the  kill  probabilities  of  simple  weapons  on  a  firing  range.  A  target  must 
enter  one's  visual  field  and  be  "centered"  there  before  it  can  be  fixed 
at  accurately,  and  the  adjustments  that  are  necessary  for  a  target  popping 
at  one  suddenly  are  altogether  different  from  those  demanded  by  a  stationary 
target  at  whloh  one  will  be  firing  away  for  some  time. 


Eckles  and  his  group  at  the  0R0  have  been  making  more  realistic 
determinations  of  the  effectiveness  of  a  tank,  as  determined  by  the  training 
of  its  orew,  the  construction  of  its  guns  and  turrets,  etc.,  by  having  it 
ride  down  a  trail  and  face  "targets"  that  show  up  suddenly  and  then  dart 
away.  If  these  "targets"  are  "anti-tank  guns"  engaging  in  this  limited 
war  game  according  to  oertain  rules,  a  conception  of  the  effectiveness  of  a 
partioular  species  of  tank  against  anti-tank  weapons  is  obtained.  Still 
more  realistically,  one  oan  have  a  tank  platoon  engage  "enemy  tanks"  and 
"infantry"  in  a  mock  battle  in  the  day  or  at  night  under  given  terrain 
conditions,  the  friendly  platoon  being  assigned  a  specific  objective  to 
be  taken  with  the  assistance  of  a  given  quantity  of  aerial  or  artillery 
support. 


Suoh  simulated  tank  encounters  have  been  used  by  others  before.  What 
distinguishes  Eckles'  efforts  is  the  extreme  lengths  to  which  he  has  gone 
to  aohieve  realismj  the  electronics  laboratory  at  the  ORD  has  wired  the 
panels  representing  enemy  tanks  so  that  they  light  up  to  simulate  opening 
fire,  continue  "firing"  while  they  are  intact,  and  buret  into  flames  when 
damaged  by  armor-piercing  rounds.  One  would  imagine  that  the  cost  of 
conducting  such  an  experiment,  other  than  symbolically  on  an  office  checker¬ 
board,  is  excessive,  which  it  would  be  if  one  had  to  stage  set  it  in  the 
countryside  from  scratch;  but  the  Army  regularly  conducts  maneuvers  that  do 
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not  differ  immensely  from  this  conception  as  part  of  its  training  program, 
and  as  Eckles  points  out,  if  engineers  and  scientists  are  willing  to  enter 
into  a  cooperative  relationship  with  it,  they  can  obtain  a  massive  amount 
of  information  useful  both  to  themselves  and  the  military  at  little  addi¬ 
tional  expense. 

The  other  paper  whloh  will  be  commented  on  has  to  do  with  the  applica¬ 
tion  of  Monte  Carlo  to  compute  lethal  areas.  "Lethal  Area”  ie  an  old 
notion  in  ordnance  researchj  it  is  that  portion  of  an  "initial*  area  in 
which  an  appropriate  target  will  be  incapacitated  by  a  weapon  system  whose 
properties  are  known;  the  ratio  of  the  two  areas  gives  the  probability 
that  the  target  will  be  lncapaoitated  when  plaoed  at  random  in  the  "initial", 
larger  area,  so  that  "lethal  area"  is  properly  a  probabi litis tic  rather 
than  purely  analytic  concept.  This  ratio  ie  a  kill  probability  with  a 
geographic  reference.  Besides  the  area  and  the  location  of  the  weapon 
system  in  relation  to  it,  there  are  many  other  parameters  inherent  in  the 
system  and  the  particular  use  to  which  it  is  being  put  at  the  time,  all  of 
them  subject  to  probability  distributions  of  their  own,  which  determine  the 
ratio. 


We  have  already  seen  that  the  Monte  Carlo  method  is  able  to  evoke  the 
myriad  manifestations  (configurations)  of  a  phenomenon  by  playing  a  game 
of  chance  on  each  of  the  phenomenon's  intrinsic  elements.  By  using  a 
table  of  random  numbers  to  deoide  which  value  in  its  distribution  of 
values  any  one  parameter  is  to  assume  at  a  particular  time,  we  oan  deter¬ 
mine  the  form  that  the  lethal  area  takes  at  the  time;  the  aforementioned 
ratio  is  then  determined  automatically.  We  cannot  go  into  the  further 
details  of  Dr.  Ehr enfold 5 s  paper,  which  was  classified,  "confidential", 
since  we  desire  to  keep  these  remarks  unclassified.  A  point  in  his  favor 
Is  that  there  is  provision  for  estimating  the  ratio  of  the  lethal  to  the 
"initial"  area  by  means  of  confidence  intervals. 


SOME  DIFFERENCES  IN  EXPERIMENTAL  DATA 
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A.  J.  Eckleaj,  III 
Operations  Research  Office 

Perhaps  the  title  of  this  presentation  is  somewhat  a  misnomer*  But  I 
do  hope  that  it  is  not  too  misleading*  Essentially,  I  would  like  to  talk 
for  a  few  minutes  about  some  of  the  different  types  of  experiments  as  I  see 
them,  and  the  necessarily  different  techniques  of  design,  analysis  and 
oontrol  which  are  required,  I  will  not  refer  to  technicalities  suoh  as  the 
choice  between  a  greco-latin  square  vs*  a  partial  faetorlal,  or  whether  we 
should  use  non-parametric  or  parametric  techniques  of  analysis.  In  essence, 
these  are  only  the  tools  of  our  trade,  and  should  be  adapted  to  the  situation 
at  hand.  However,  I  might  imply  that  the  most  suitable  designs  presently 
available  for  military  field  experimentation  are  the  more  simple  ones,  and 
the  best  techniques  of  analysis  which  meet  the  necessary  assumptions  (or 
laok  of  assumptions)  in  this  type  work  are  non-parametric. 

I  would  first  like  to  mention  some  relevant  background  material*  Itoe 
primary  purpose  of  conducting  military  research  is  to  provide  us  with  data 
from  which  we  oan  predict,  with  some  degree  of  accuracy  (upon  which  our 
lives,  and  perhaps  even  our  freedom  might  depend),  the  outcome  of  future 
combat  actions  in  whioh  a  variety  of  weapons  systems  are  used*  Once  we 
oan  do  this,  it  is  then  a  relatively  simple  matter  to  seleot  those  systems 
which  give  us  the  highest  probability  of  success. 

Now  we  attempt  prediction  by  a  variety  of  devious  means  (short  of 
actual  combat)  in  whioh  we  construct  models,  extrapolate  from  performance 
characteristics,  etc*,  until  ve  finally  reach  conclusions  and  make  recommend¬ 
ations  as  to  the  relative  value  of  a  particular  weapons  system* 

But  here  we  are  faced  with  a  major  difficulty!  Just  what  sort  of 
performance  data  for  each  weapon  system  shall  we  use  in  our  model?  It 
is  quite  evident  that  if  our  models  approach  reality  then  they,  too,  will 
be  affected  by  important  changes  in  performance  characteristics  for  the 
various  weapons  systems.  We  could,  of  course,  ascribe  a  particular  set  of 
desirable  characteristics  to  a  new  weapons  system,  and  then  determine  the 
effects  that  such  a  system  would  probably  have  on  the  outcome  of  a  particular 
type  of  battle.  To  a  large  degree  this  is  done  in  the  better  grade  Soience 
Fiction  novels,  where  we  carry  this  extrapolation  one  step  further' (there¬ 
fore  becoming  more  realistic)  and  ascribe  a  particular  set  of  characteristics 
to  our  human  actors. 

lb  be  quite  frank,  I  have  been  thinking  of  doing  this  as  a  preliminary 
step  in  the  night-fighting  program  at  0R0.  But  in  this  case,  of  oourse, 

I  would  prefer  to  dignify  the  prooess  by  giving  it  a  different  name  than 
"Science  Fiction" -probably  just  dropping  the  word  "fiction"  would  help  some. 
We  could  set  up  a  particular  battlefield  situation  in  which  the  action  takes 
place  at  night.  Then  we  oould  examine  the  outoome  of  the  battles  if  the 
opposing  forces  were  variously  equipped  for  night  oombat.  For  example,  if 
the  enemy  had  IR  and  we  had  white  lightjf  or  the  enemy  had  nothing  and  we 
had  far  IR  imaging  equipment}  etc.  After  many  machine  hours  and  several 
volumes  of  reports,  I  could  probably  conclude  that  the  better  the  perform¬ 
ance  characteristics  of  our  fighting  equipment  and  personnel  combination, 
the  higher  would  be  our  chances  of  winning  a  battle. 
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But  we  still  haven’t  solved  the  problem  of  exactly  which  performance 
characteristics  we  should  use  in  order  to  obtain  a  desired  level  of  per¬ 
formance  in  the  field  or  in  actual  combat. 

When  a  new  weapon  is  in  the  "drawing  board"  stage,  the  designers  feel 
as  though  they  have  at  least  some  idea  of  the  future  performance  character¬ 
istics.  We  can  say  with  some  assurance,  for  example,  that  an  automatic 
loading  device  in  a  tank  will  provide  us  with  a  higher  potential  cyclic 
rate  of  fire  than  Manual  loading)  or  that  with  a  suitable  rangefinder 
system  we  can  obtain  range  data  accurate  enough  to  hit  a  man  size  target  at 
1,000  yards  quite  consistently. 

However,  I'm  proposing  here  that  we  can  never  hope  to  extrapolate 
from  drawing  board  characteristics,  manufacturer's  specifications  or 
even  "Army  Board"  or  "proving  ground"  type  data  and  predict  the  relative 
effectiveness  of  a  particular  weapon  in  a  combat  situation.  If  we  do  this, 
we  must  be  certain  that  we  add  the  term  "flotion"  behind  our  endeavors 
in  "Science"  to  avoid  misleading  our  audienee.  In  other  worda,  we  have 
at  best  hopelessly  limited  ourselves  to  a  system  of  arm-ehair  philosophy 
because  we  choose  to  ignore  the  all  important  interactions  between  the 
so-called  "human  variable"  and  the  weapon,  and  the  higher  order  inter¬ 
actions  between  the  man-machine  weapons  system  and  the  conditions  under 
whioh  the  actions  take  place. 

Now  I'm  sure  that  it  is  not  necessary  to  further  justify  to  any  of 
you  the  need  for  re&llstlo  experimental  data  upon  which  to  base  our  pre¬ 
dictions  for  the  future.  But  the  question  I'm  trying  to  bring  out  lsi 
whioh  of  the  many  types  of  experimental  data  should  be  utilized  in  order 
to  answer  questions  of  importance  to  the  Military? 

I  would  like  to  present  one  example  which  will  illustrate  the  nature 
of  the  problems  we  face.  First,  oonslder  the  selection  of  a  rifle  for 
combat.  We  can  experimentally  measure  such  factors  as  rates  of  fire, 
accuracy  of  the  weapon  when  fired  from  a  machine  rest,  barrel  life,  etc. 
These  studies  would  not  be  what  I  would  call  Military  field  researoh. 

What  the  military  is  really  interested  in  is  the  over-all  casualty  produc¬ 
ing  effectiveness  of  the  man-machine  system  when  various  types  of  weapons 
are  used.  For  example,  the  number  of  target  hits  (as  different  from  the 
number  of  targets  hit)  is  not  a  measure  of  a  weapon^  performance  in  the 
military  situation  unless  we  are  willing  to  equate  the  killing  of  one  man 
ten  times  with  the  killing  of  ten  men  one  time  each. 

And  while  such  factors  as  rates  of  fire  and  potential  accuracy  are 
undoubtedly  related  in  some  presently  unknown  and  undoubtedly  non-linear 
way  to  the  combat  effectiveness  of  the  rifle-man  weapons  system,  the  only 
manner  of  actually  predicting  the  effectiveness  of  such  a  system  is  to 
conduot  a  field  study  in  which  we  use  a  suitable  realistic  criterion 
measure.  And  this  is,  I  believe,  at  the  present  time  the  area  of  military 
research  which  presents  the  greatest  problems i  the  development  of  realistic 
criterion  measures  which  can  be  used  in  the  conduct  of  field  experiments. 

It  has  often  been  said  that  in  order  to  conduct  "Field  Experiments", 
the  scientist  moves  his  "laboratory"  out  into  the  "field"  to  collect  his 
data.  This  is  perhaps  true  in  the  non-military  types  of  field  studies 
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such  as  those  currently  being  conducted  in  rocket  research,  lethal  radius 
of  burst  from  projectiles,  barrel  erosion,  etc.  But  when  we  become  in¬ 
volved  in  military  research,  which  includes  the  utilization  of  military 
units  with  all  the  concomitant  problems  of  man-machine  interactions, 
and  the  host  of  differences  attributed  to  the  human  variable,  we  must  admit 
that  the  problems  faced  in  field  research  are  vastly  different  from  those 
faced  in  the  laboratory. 

In  the  laboratory  where  we  examine  the  relatively  simple  phenomenon 
(such  as  the  fluttering  of  a  relay,  the  time  of  projectile  flight,  growth 
of  corn,  behavior  of  rats,  or  the  performance  of  memory),  we  can  afford  to 
indulge  our  whims  and  use  complex  experimental  designs  and  their  necessary 
techniques  of  analysis.  However,  in  the  area  of  military  field  research 
where  the  important  problems  are  highly  complex,  we  usually  find  that  our 
requirements  are  much  more  efficiently  met  by  quite  simple  designs,  and 
even  the  simpler  techniques  of  data  analysis  (primarily,  of  course, 
because  these  simpler  techniques  ((such  as  non-parametric  statistics)) 
require  that  fewer  assumptions  be  made  about  the  conditions  of  data 
collection) . 

Now  I  appreciate  your  being  patient  with  me  as  I  may  have  wandered 
around  the  proverbial  barn,  but  I  felt  that  it  was  necessary  to  present 
some  of  the  probloms  which  have  forced  us  to  try  a  relatively  new  method 
of  attacking  the  problems  of  military  field  research.  In  addition  to  the 
problems  I've  mentioned  above  (i.e,,  adequate  control,  suitable  criterion 
measures.,  etc.),  we  also  have  the  very  practical  problems  of  expense,  both 
in  money  and  in  man-hours  and  equipment.  We  Just  have  to  face  the  fact 
that  it  is  difficult  to  conduct  the  large  number  of  field  studies  which 
are  urgently  required.  (And  here  I  would  like  to  refer  you  to  a  talk 
tomorrow  which  will  be  made  by  Lt.  Col.  Clement,  which  will  give  many 
practical  suggestions  for  urgently  needed  research.) 

So  in  order  to  find  a  practical  solution  for  obtaining  roallstic 
data,  we  are  going  to  try  and  develope  what  we  call  a  working  symbiotic 
relationship  between  0R0  Field  Teams  and  Amy  Post-cycle  training  programs. 
We  feel  that  at  the  present  time  there  is  a  large  source  of  data  in  the 
Army  Training  Programs  which  is  going  to  waste  simply  because  we  have  not 
yet  developed  suitable  systems  and  techniques  of  data  collection.  By 
using  such  techniques  there  will  be  no  shortage  of  experimental  subjects, 
and  our  samples  can  be  as  large  as  we  wish.  The  supplies  and  equipment 
available  are,  compared  to  previous  field  studies,  inexhaustable.  The 
only  "real”  cost  to  obtain  this  data  is  what  is  required  for  Instrumentation 
and  researcher  salaries. 

What  we  propose  is  truly  a  symbiotic  relationship,  not  a  parasitical 
one,  for  the  military  gain  as  much  from  these  techniques  as  does  the 
research  worker,  and  in  most  cases  even  more.  Their  direct  gains  are 
primarily  in  the  form  of  increased  realism  in  the  training  program,  and 
the  concomitant  increase  in  troop  motivations. 


In  essence,  this  technique  requires  that  we  superimpose  simple  ex¬ 
perimental  designs  and  data  collection  techniques  over  the  Amy  training 
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programs.  Of  course,  the  designs  used  must  be  simple  to  follow  in  the 
field  to  minimize  the  control  problems  and  interference  with  necessary 
military  proceedures.  And  in  order  that  the  resulting  data  have  greater 
value,  the  instrumentation  must  not  detract  from  the  normal  operations,  but 
rather  increase  the  realism  where  possible. 

Before  spending  a  few  minutes  describing  one  application  of  this 
symbiotic  relationship,  I  would  like  to  discuss  some  of  the  differences 
between  data  obtained  in  this  manner  and  data  which  might  be  obtained 
from  the  conduct  of  a  specific  experiment.  Essentially,  we  would  find 
the  following  differences. 

1.  Our  control  is  not  always  what  we  would  like.  In  many  cases  we 
are  in  the  position  of  astronomers  who  can  only  record  the  events  as 
they  happen,  but  are  limited  in  the  manipulations  which  they  can  perform* 

In  other  oases  safety  precautions  foroe  us  to  utilise  situations  which  are 
unrealistic. 

2.  In  compensation  for  our  lack  of  rigid  controls,  however,  we  are 
able  to  utilise  continuing  cyoles  of  training,  thus  increasing  our  sample 
size  far  beyond  what  we  could  expect  to  demand  in  a  specifically  conducted 
experiment* 

3.  We -have  time  between  runs  to  "Debug1'  our  program,  improve  our 
data  collection  system,  and  build  our  design  as  we  progress.  (Though  this 
might  violate  some  of  our  ourrent  thinking j  i*e.,  that  we  eomplete  our  ex¬ 
perimental  design,  including  the  methods  of  data  analysis,  prior  to  the 
oonduct  of  the  study.) 

I  would  now  like  to  spend  a  few  moments  in  giving  you  a  brief  descrip¬ 
tion  of  how  we  plan  to  utilise  this  technique  of  "8ymblon"  in  order  to 
collect  one  type  of  experimental  data. 

Fort  Stewart,  Georgia,  is  presently  conducting  as  part  of  their 
regularly  scheduled  training  program  a  problem  which  involved  a  tank  platoon 
in  a  night  attack,  using  live  ammunition.  This  problem  was  called  the 
T-2  exercise.  Essentially  this  was  a  free-play  exeroise  in  which  the 
platoon  leader  was  assigned  the  mission  of  taking  his  objective  by  a 
night  attaok,  when  the  objective  was  defended  by  enemy  tanks  and  infantry. 

In  this  attack  he  was  supported  by  a  60-inch  searchlight.  The  enemy  tanks 
were  represented  by  the  standard  6x6  panel  targets,  and  the  enemy  infantry 
by  the  standard  Type  E  targets.  The  attacking  platoon  would  be  notified 
by  radio  that  they  were  under  enemy  fire  at  an  appropriate  time  during 
their  advance,  and  they  would  then  undertake  to  fire  upon  the  targets 
until  all  of  their  ammunition  was  expended. 

It  was  the  normal  conduct  of  this  T-2  exercise  and  the  close  coopera¬ 
tion  by  the  offioers  and  men  of  Fort  Stewart  which  have  made  it  possible 
for  the  ORO  field  team  to  design  and  conduct  the  present  research  project 
in  night  fighting.  On  the  part  of  Fort  Stewart,  they  have  permitted  the 
use  of  their  training  program,  with  the  necessary  modification,  to  change 
the  T-2  exercise  into  a  veritable  "laboratory-in-the-field."  This  has, 
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of  course,  required  additional  effort  from  both  the  officers  and  supporting 
personnel,  and  a  willingness  to  put  up  with  the  needs  and  desires  of  the 
scientist.  But  in  return  for  these  additional  burdens,  the  scientists 
from  0R0  have  added  realism  and  meaningfulness  to  the  training  program. 


For  example,  the  Electronics  Laboratory  at  0R0  has  designed  and  supplied 
a  new  type  target  to  simulate  the  enemy  tanks.  These  targets,  rather  than 
being  simple,  passive  panels,  initiate  the  engagement  by  simulating  open¬ 
ing  fire  upon  the  attacking  platoon.  The  targets  then  oontinue  to  "fire" 
upon  the  platoon  being  tested  until  they  are  hit  by  an  AP  round  (small  arms 
fire  and  small  fragment  hits  have  no  effect).  When  finally  hit  by  an  AP 
round,  the  newly  developed  0R0  targets  stop  firing  and  burst  into  flames 
to  simulate  a  burning  enemy  tank. 


Throughout  this  rather  realistio  engagement,  the  field  team  from  ORO 
is  busily  collecting  and  recording  appropriate  data  which  will  provide  a 
measure  of  the  platoon's  effectiveness  in  night  oombat. 


Over  a  period  of  several  months,  by  testing  a  number  of  units  equipped 
with  a  variety  of  night  fighting  equipment  -  such  as  tank  mounted  fighting 
lights,  infra-red  equipment,  pyrotechnics,  etc.  -  this  joint  ORO-Fort 
Stewart  project  will  not  only  better  prepare  these  units  for  night  combat, 
but  also  provide  us  with  the  answers  to  a  number  of  questions  about  our 
present  capabilities  for  night  operations.  Questions  such  as  the  relative 
fire  effectiveness  of  armored  platoons  when  equipped  with  various  types 
of  equipment,  hit  probabilities,  and  rates  of  fire  of  our  tanks  under 
various  types  of  illumination,  etc.,  will  be  at  least  partially  answered 
by  the  first  phase  of  Project  SfMBION. 


In  summary,  then,  I've  been  making  a  plea  for  more  data  of  the  type 
which  is  obtained  from  operationally  realistic  field  experiments,  in  con¬ 
trast  to  the  type  of  data  obtained  in  most  "laboratory-type"  or  "proving- 
ground-type"  studies.  And  I  have  proposed  a  possible  technique,  "3YMBI0N", 
for  obtaining  this  type  of  data  with  minimum  expense.  In  fact,  ORO  has 
designed  such  a  program  with  the  cooperation  of  the  Officers  of  Fort  Stewart, 
Georgia,  which  will  begin  this  October  (1956). 
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THE  APPLICATION  OF  DESIGN  OF  EXPERIMENTS  AND  MODELING 
TECHNIQUES  TO  COMPLEX  WEAPONS  SYSTEMS 


I 


iSs 


E,  Biser  and  M0  Meyerson 
Signal  Corps  Engineering  laboratories 


1.  Purpose.  The  purpose  of  this  paper  is  to  outline  a  conceptual  plan 
and  framework  that  was  used  to  establish  a  Design  of  Experiments  for  a 
weapons  system.  Further,  the  paper  will  indicate  the  application  of  a 
model  for  analyzing  the  system. 


2.  Background.  During  World  War  II,  it  became  apparent  to  antiaircraft 
experts  that, although  individual  antiaircraft  gun  batteries  were 
relatively  effective  against  single  targets,  the  defense  of  a  critical 
objective,  as  a  whole,  against  large  target  raids,  was  relatively  in¬ 
effective.  Consequently,  military  requirements  were  formulated  for  an 
integrated  system,  wherein  all  the  processes  of  AA  defense  could  be 
coordinated,  resulting  in  an  overall  increased  system  effectiveness. 

A  system  was  proposed  by  the  Signal  Corps  Engineering  Laboratories, 

Fort  Monmouth,  New  Jersey,  approved,  developed,  oonstruoted,  installed 
and  readied  for  test.  This  paper  described  the  processes  whioh  were 
Involved  in  developing  the  test  plan,  some  of  the  general  tests,  and 
the  final  consideration  of  the  effioacy  of  this  complex  system. 

Although  the  system  has  been  completely  tested,  broken  down  to  basic 
sub-systems  and  given  to  other  agencies  for  research  and 'development, 
it  has  served  this  purpose  well,  and  the  concepts  described  herein  have 
formed  the  basis  for  evaluating  all  .other  systems  of  this  type,  under 
Army  Signal  Corps  oognizanos. 


3.  Discussion, 


*•  Design  of  Experiments.  Although  many  definitions  exist  for  this 
term,  a  most  appropriate  one  for  the  purpose  of  this  paper  might  be  that 
depicted  in  Figure  1„#  ,  Here  the  system  is  shown  as  a  series  of 

symbols  depicting  the  man-machine  combinations  and  interactions,  all 
combining  to  produce  a  desired  objective.  The  purpose  of  the  experi¬ 
mental  design,  then,  is  to  adequately  define  the  desired  objective  (or 
objectives),  test  the  system  to  measure  that  objective,  and  then  tof 
determine  the  contribution  of  each  system  block  toward  the  desired 
objective. 


In  the  light  of  the  basic  objective,  we  were  confronted  with  the 
fact  that  we  had  a  new  system  that  would  obviously  be  compared  with  an 
existing  system  prior  to  the  time  Army  Staff  might  accept  it  for  standard¬ 
ized  issue.  Hence,  we  considered  it  advisable  to  analyze  and  to  clarify 
the  following  semantical  equation 8  our  goal  is  to  measure  the  improvement 
of  this  newly-proposed  Weapons  System  over  existing  Antiaircraft  Defense 
Systems,  The  sentence  can  best  be  investigated  by  symbolizing  "Improvement" 
by  (l)j  "Newly  Proposed  Weapons  System"  by  (2),  and  "Existing  Systems"  by 
(3),  as  follows? 


(l)  Improvement  s  The  following  relevant  questions  naturally 
present  themselves  concerning  the  concept  of  improvements 


*  Figures  appear  at  end  of  tho  article,. 
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(a)  What  ifi  meant  by  improvement? 

(b)  What  are  its  criteria? 

(c)  What  magnitude  of  improvement  is  to  bo  discussed  and 
analyzed? 

(d)  What  is  the  optimum  method  of  measurement  of  improvement? 

(e)  Who  has  to  be  convinced  that  the  method  of  analysis  and 
especially  that  the  design  of  experiment  has  yielded  significant  and  worth¬ 
while  results  regarding  improvement! 

1.  What  is  meant  by  Improvement?  There  are  two  main 
areas  where’ [improvement  is  urgently  needed  as  follows s 

a.  Rational  distribution  of  fire*  By  this  is  meant 

a  firing  doctrine  or  rationale  that  optimizes  'KOdmuirT'clamage  tio  the  defended 
area,  attrition  or  prevention  of  penetration  by  spreading  AA  fire  over  the 
entire  attaoking  raid* 

b,  Improved  intelligence  on  air  raids,  i.e.,  with 
respect  to  detection  and  identification  o/  targets,  while  rational  distri¬ 
bution  of  fire  is  readily  given  to  quantitative  evaluation,  improved 
intelligence,  although  contributing  greatly  towards  overall  system  effect¬ 
iveness,  is  not  easily  quantifiable*  It  should  be  noted  that  it  may  not 
be  possible  to  evaluate  the  measure  of  rational  distribution  of  fire  with¬ 
out  taking  cognizance  of  Improved  intelligence  on  air  raids* 


Here,  the  test  ds signer  quantifies  the  basic 
test  objectives,  for  which  all  following  concepts  and  the  actual  tests 
will  be  designed. 

2.  What  is  to  be  the  criterion  or  criteria  of  Improve¬ 
ment?  This  is  a  vital  question  slnoe  iT  wiirTiave  Vgreat  hearing  on  the 
type  of  defense  index  to  be  quantified*  The  criterion  of  improvement  may 
oonsist  of  the  optimization  of  defense  per  dollar  spent.  This  oonoept  oan 
be  further  narrowed  down  and  particularized  to  the  following  quantifiable 
parameters! 

a.  Least  damage  to  the  defended  area  per  dollar  spent 
on  antiaircraft  defense  for  that  area.  This  indicates  that  the  aim  of 
building  a  defense  system  is  to  prevent  damage  (i.e.  physical,  psychological, 
productive,  et  al)  to  a  defended  area  above  a  predetermined  minimum.  Here 
damage  is  the  independent  variable  and  is  established  at  a  value  above  which 
the  war  potential  of  the  area  is  seriously  or  complstely  hampered. 

b,  Maximum  damage  to  enemy  raiders  per  dollar  spent 
on  antiairoraf t  defense .  Ms  stresses  that  the  objective  of  building  a 
defense  system  is  to  insure  a  predicted  maximum  attrition  (i.e.  the  loss 
to  the  enemy  of  his  attacking  aircraft  and  consequent  destructive 
potential)  for  a  given  area.  Here  attrition  is  the  independent  variable 
and  is  established  at  a  value  above  which  a  certain  number  of  potentially 
destructive  enemy  airoraft  would  elude  the  defenses. 
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c.  lowest  probability  of  penetration  by  enemy 
raiders  into  the  defended  area  per  dollar  spent  on  antiaircraft  defense, 
ftiis  states  that  the  goal  of  building  a  defense  system  is  to  insure  the 
prevention  of  a  certain  percentage  of  enemy  penetration  to  a  defended 
area.  Here  prevention  of  penetration  is  the  independent  variable  and 
is  established  at  a  value  below  whioh  a  certain  number  of  potentially 
destructive  enemy  aircraft  would  penetrate  the  defenses. 

Here  the  designer  offers  some  food  for  thought 
for  which  the  objective  may  be  measured. 

2*  Magnitude  of  Improvement t  It  is  neoessary  to  assign 
a  measure -number  to  the  concept  of  improvement,  sinoe  this  number  will 
tend  to  give  a  decisive  indication  of  the  effioacy  of  this  integrated 
defense  system*  It  is  estimated  that  the  following  magnitudes  of  improve¬ 
ment  of  the  newly-proposed  system  over  existing  systems  might  be  expeotedt 

a.  For  low  kill  probability  weapons  in  the  system 
subjected  to  aaturated“types  of  raids,  a  small  improvement  might  be 
expected  with  respect  to  the  three  parameters  mentioned  above,  sinoe  even 
coordination  of  low  kill -probability  weapons  does  not  materially  lnorease 
their  overall  effectiveness  (determined  by  allied  studies).  Ihe  contribu¬ 
tion  towards  this  overall  improvement  is  due  to  rational  distribution  of 
fire,  as  well  as  to  improved  intelligence  on  air  raids. 

In  the  oase  of  these  low  kill  probability 
weapons,  however,  beoauee  of  their  low  kill  probability,  Improved  air  raid 
intelligence,  though  not  rigorously  quantifiable,  appears  to  contribute 
most  to  overall  improvement  with  respect  to  the  aforementioned  three 
parameters.  In  the  light  of  these  considerations,  it  would  appear  that 
experimental  research  oould  better  be  concentrated  on  improvement  of 
intelligence,  and  analytical  researoh  pursued  in  the  area  of  rational 
distribution  of  fire  for  these  weapons. 

Here  the  designer  actually  recommends  where 
testa  and  analysis  could  best  be  utilized  for  maximum  economy;. 

b„  For  other  weapons,  because  of  their  higher  kill 
probability,  it  is  anticipated  that  a  greater  improvement  with  reepeot 
to  the  aforementioned  parameters  could  be  attained.  In  this  case,  for 
reasons  alluded  to  previously,  it  would  appear  that  experimental  and 
analytical  researoh  should  be  equally  apportioned  with  respeot  to  rational 
distribution  of  fire  and  improved  intelligence. 

Here,  again,  the  designer  indicates  the  type 
of  effort  to  be  expended,  but  for  different  weapons. 

h.  Optimum  Method  of  Measurement}  The  consideration  of 
optimum  (but  practical)  methods  of  measurements  and  comparison  of  the 
newly-proposed  system  with  existing  systems  entail  the  following  two  modes 
of  comparison; 
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a.  Comparison  on  a  simulated  basis,  with  only  the 
output  (i.e.,  weapon  battery  firing)  being  simulated.  Ibis  means  that 
air-craft  will  actually  be  flying  and  effective  kills  calculated  on  a 
simulated  weapon  battery  firing  basis. 

b,  Comparison  of  systems  by  simulating  both  the 
input  (Target  Simulator)  with  airoraft  not  flying,  and  output,  (AADECAR- 
Antiairoraft  Defense  Effectiveness  Computer  and  Becorder)  with  weapon 
batteries  hot  firing. 

Here  the  designer  specifies  the  nature  of  the 
test  and  even  some  of  the  major. 'test  equipment  to  be.. used. 

£ •  Feraohnel  Interested  in  Analysis''  and  Findings'! 

Three  different  primary  agencies  and' interest's  areconcemed  with  the  '■ 
results  of  the  analysis  and  the  findings  of  the  experimental  design » 

a.  Army  Antiaircraft  Command,  the  ultimate  user  of 
the  equipment,  is  interested  from  the  standpoint  of  operability,  reliability, 
and  overall  effectiveness;  as  a  tactical  weapons  system.  ■  ' 

b*  Continental  Army  Command,  as  the  experimental  1 
arm  of  the  Army  for  systems  of  this  typs,  is  oonoemed  with  the  verification 
of  operational  concepts  set  forth  in  the  military  characteristics,  as  well 
as  with  operability  ahd  reliability, 

-0.  Signal  Corps,  aS1  the  teohnicol  service*  is  bon* 
oerned  with  obtaining  technical  data  on  all  significant  faetora  whieh  1 
affeot  the  overall  system  design.  1 

Here  the  designer  has  indioated  that  the  fihAl 
test  results  must  be  in  euoh  a  form  aa  to  be  readily  understandable  to 
different  agencies,  with  different  Interests,  all  of  whom  will  draw 
conclusions  regarding  the  system  effloaoy. 

(2)  The  Newly  Proposed  System.  Since  the  system  is  not  a  static 
model,  it  is  worth  noting  that  a  description  of  that  system  falls  into 
two  categories  aa  follows t 

(a)  The  present  installation  consists  basically  of  detection, 
identification,  data  processing,  tactioal  evaluation,  assignment,  acquisition, 
tracking  and  engagement  functions, w ith  interconnecting  communications 
(further  details  will  not  be  revealed  here  because  of  the  classification 
of  the  information,  and  ainoa  it  is  not  particularly  germana  to  this 
discussion) . 

Here  the  designer  actually  described  the  system,  so  that 
the  establishment  of  the  mathema^ leal  model,  and  the  ultimate  conclusions 
regarding  the  contribution  of  each  of  the  major  system  blocks  will  have 
the  same  meaning. 
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(b)  A  short  term  improvement  version  of  the  present  instal¬ 
lation  with  improved  technical,  tactical  and  operational  facilities  (again 
no  further  details  are  necessary  here). 

Here,  again,  the  designer  recognizes  the  logioal  progreasion 
to  a  slightly  improved  model  which  will  also  be  covered  by  this  evaluation. 

(3)  Existing  Systems t  The  existing  system  is  used  as  a  reference 
system  with  respect  to  which  improvements  are  to  be  measured.  This  system 
is  then  defined  (not  in  this  paper)  in  the  same  manner  as  was  the  newly 
proposed  system. 

With  the  foregoing  clearly  established,  the  designer  would 
then  focus  attention  on  some  of  the  crucial  factors  that  are  likely  to 
affeot  system  effectiveness.  Some  of  the  following  faotors,  singly-  and 
severally,  were  considered  as  follows i 

Broad  Faotors i 

Performance  of  man-maohine  system  subjected  to  saturated  types 
of  raids. 

Performance  of  man-maohines  under  conditions  of  jamming  and  olutter. 

Performance  of  data  processing  equipment  In  response  to  diverse 
and  oomplex  oourses. 

Capability  of  human  operator  to  perform  assigned  tasks  under  adverse 
conditions  of  oomplex  and  saturated  raids. 

Detailed  Faotors > 

Rate  of  entry  of  targets. 

Reliability  and  resolution  of  identification  sets  in  the  system, 

Effeot  of  radar  resolution  at  ranges  of  primary  Interest  on  the 
operation  of  the  system. 

Resolution  and  readability  of  displays  and  boards. 

Effect  of  battery  acquisition  time  on  system  effectiveness. 

Having  thus  established  the  conceptual  framework  for  the  test,  the 
next  step  was  to  model  the  system  so  that  it  might  bast  by  analyzed. 

b.  The  Modeling  Approach- 

Although  no  stranger  to  soienoe,  no  term  is  more  frequently  ueed 
in  current  literature  on  operations  research  than  that  of  model.  Indeed, 
the  concept  of  model  has  oome  to  connote  the  hallmark  and  oanon  of  soientifio 
method  and  intelligibility.  Scientists  have  given  substance  to  the  ideas 
embodied  in  their  theories  by  means  of  mental  pictures  or  physical  modelo, 
suoh  as  models  of  ships,  railroads  and  airplanes  (to  study  flight  character¬ 
istics),  just  to  mention  a  few  static  models. 


The  questions  naturally  arise*  what  is  a  mathematical  model,  and 
how  is  it  helpful  in  describing  the  functions  of  a  large  scale  man-machine 
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system?  what  is  it  purported  to  do?  what  are  its  constitutive  elements? 
how  is  it  constructed,  etc.? 

One  word  singularly  expresses  the  most  essential  meaning  and 
significance  of  model  sit  is  the  term  symbol.  A  symbol  is  a  representation 
of  an  event.  This  term,  then,  is  the  key  "to  the  following  compact 
definition" of  a  mathematical  model.  A  mathematical  model  is  a  symbolic 
representation  of  a  system  (the  domain  of  phenomena  under  investigation). 

(1)  Weapon  Systems 

Before  analysing  the  structure  of  a  model,  let  us  pause 
briefly  to  review  the  peculiar  nature  of  a  weapon  system.  A  weapon 
system  is  an  organisation  of  men  and  equipment  designed  fpr  operation  and 
use  against  a  olass  of  entities  known  as  targets.  In  order  to  oarry  out 
its  overall  funotlon,  it  must  also  oarry  out'  many  complex  subfunctions. 

The  function  of  the  system  oan  be  funotionally  subdivided  into  many 
different  activities,  depending  upon  the  kind  and  types  of  aotivity  to 
be  oarried  out.  Each  functional  aotivity  requires  certain  quantitative 
inputs  to  be  converted  by  this  functional  aotivity  into  another  quantity 
called  output. 

A  Weapon  System,  for  instance,  consists  of  observation  units. 
information  processing  units,  and  action  units.  It  contains  communication 
faoiliiieT’to  Handle'  classes  of  information  such  as  weapon  information, 
target  information,  etc. 

The  concept  of  Model  is  predicated  on  the  assumption  that  it 
is  possible  to  abstract,  from  a  complex  system,  oertain  persistent  and 
discernible  relationships  and  to  mathematics  and  quantify  these  relations 
with  a  view  of  describing  the  behavior  of  the  system.  The  initial  stages 
of  modeling  consist  of  devising  oonoepts  that  desoribe  the  purpose, 
functions,  operations,  pertinent  parameters  or  state  variables,  all  of 
whioh  go  toward  erecting  the  frame  of  referenoe  for  the  mathematical 
model  to  be  operative.  This  was  accomplished  in  the  earlier  portion  of 
this  paper.  The  goal  is  to  construct  a  model  so  that,  by  studying  its 
oharaoteristios,  it  will  be  possible  to  deduce  the  state  of  the  system 
(the  output  of  the  system)  under  varying  conditions~(raid  configurations) . 

(2)  The  Objective t 


The  main  objeotive  is  to  construct  a  theoretical-experimental 
model,  hereafter  to  be  referred  to  as  a  mathematical  model  for  evaluating 
the  eff ioaoy  of  the  weapon  system  and  to  evolve  intrinsic  and  comparative 
criteria  and  measures  of  effectiveness.  The  aim  of  the  model  is  to  establish 
a  theoretical-experimental  structure  within  which  the  large  scale  man- 
maohine  system  is  to  be  evaluated  with  respect  to  oertain  predetermined 
criteria  of  effectiveness,  such  as  maximum  defense,  maximum  attrition, 
etc.  The  point  of  departure  is  that  the  best  way  of  describing  and 
evaluating  the  large  soale  system  is  to  oonstruot  a  model  involving 
quantifiable  parameters  to  predict  the  dependence  and  variation  of  each 
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pertinent  parameter  on  each  functional  activity  of  the  system.  The  model 
should  exhibit  how  the  various  functions  of  the  system,  such  as  detection, 
identification,  data  processing,  tactical  evaluation,  assignment  to 
weapons,  acquisition,  tracking,  engagement  and  weapon  characteristics 
affect  one  another,  i.e,,  how  they  are  interrelated  and  interconnected. 

The  model  envisaged  here  is  not  an  aprloristlc  one, 
namely,  one  totally  divorced  from  test  data  and  superimposed  on  the  system, 
without  recourse  to  test  data.  It  is  not  an  axiomatic  model  so  character¬ 
istic  of  abstract  mathematical  systems  defined  Implioitly  by  a  set  of 
axioms  without  regard  to  any  significance  and  meaning  attributed  to  the 
symbols  used,  (The  significance  of  the  symbols,  in  an  axiomatic  model, 
is  governed  solely  by  the  linguistic  rules  laid  down  by  the  axioms.)  The 
model  to  be  operational  in  the  experimental  sense  is  not  to  be  construed 
as  a  mathematical  soheme,  or  as  an  ensemble  of  aprlori  concepts  to  be 
arbitrarily  imposed  on  the  operations  of  the  system. 

Such  concepts  untested  and  not  subjected  to  experimental 
control  would  be  sheer  intellectual  ghosts  without  operational  efficacy 
and  meaning.  It  is  clear  that  the  importance  of  test  data  oannot  be 
gainsaid.  Nor  can  they  be  dispensed  with  in  the  modeling  approach.  It 
is  a  realistic  system  (or  a  class  of  structurally  similar  systems)  whose 
behavior,  output  and  time  response  are  to  bo  described  and  predicted  by  a 
theoretloal  model.  The  weapon  tests  (with  live  and  simulated  inputs) 
will  provide  data  that,  when  properly  reduoed,  will  provide  unbiased 
statistical  estimates  of  significant  parameters.  It  is  these  parameters 
that  are  to  form  the  baslo  structural  elements  of  the  model. 


The  teat  data  will  provide  the  quantitative  empirical 
data  to  fill  out  the  model  and  to  validate  the  model  experimentally.  It 
Is  the  model,  through  its  predictive  effloncy,  that  is  to  describe  and 
to  predict  the  response  of  the  system  to  varying  inputs  (raid  configura¬ 
tions). 

(The  flow  chart  in  Fig.  2  profiles,  by  block  diagram, 
the  distinctive,  logical,  and  sequential  steps  involved  in  system 
modeling.) 

(3)  The  Mathematical  Model!  The  Weapon  System  is  functionally 
divided  Into  the  following  activities  or  units 5 


Detection,  identification,  data  processing  (manual  and/or 
automatic),  tactical  evaluation,  assignment  of  weapons  to  target, 
acquisition,,  tracking,  firing  and  ultimate  kill.  The  partitioning  of  the 
overall  function  of  the  system  into  these  subfunctions  was  made  advisedly 
consonant  with  the  concept  of  a  weapon-complex  as  a  dynamic  or  a  tlme- 
responae  system.  It  was  natural  to  undertake  measurements  of  time 
interval a { -time  delays)  corresponding  to  these  functional  activities. 

These  time  delays  are  to  be  described  as  mathematical  functions  of  input 
parameters,  such  as  range,  radar  cross  section,  velocity,  etc. 
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The  model  is  structurally  isomorphic  to  a  logical 
syllogiam  in  which  the  system  is  the  major  premise,  the  Input  ia  the 
minor  premise  and  the  output  is  the  conclusion.  This  basic  syllogiatic 
description  of  a  system  has ImportanFlj^pITcaTions .  The  conceptual  scheme  * 

input  -  system  -  output 
oan  be  expressed  as  follows t 


Given  a  system  and  a  class  of  Inputs  to  determine  the 


output  response  characteristics  o, 


expressed  in  the  roiiowmg  symooiic  equation 

t 

X0  (t)  -  £  S  <t)  op  X*  <t)  t  t 


this  is  symbolically  analogous  to  an  integral  equation. 

(t)  •  the  set  of  output  responses  of  the  System  (to  be  des¬ 
cribed  subsequently) 

Xj_  (t)  2  the  set  of  inputs  to  the  systems. 

3  (t)  2  the  set  of  transfer  functions  characterizing  the  system. 

op.  S  the  ooupllng  of  the  Inputs  to  the  system. 

With  this  in  mind,  the  model  is  to  consist  of  the  following 
structural  units* 

(a)  Model  Parameters*  These  consist  of  eleven  (ll)  functionally 
defined  time  delays  lV  "to  Tq.  In  fact,  these  parameters  are  probability 
distributions  of  the  time  intervals  associated  with  various  functions  of 
the  systems.  It  is  to  be  noted  that  the  term  "parameters"  is  not  to  be 
construed  as  a  statistic  suoh  as  mean,  variance,  etc.,  but  as  functional 
variables  which  are  in  turn  to  be  related  to  input  variables. 

The  boundaries  of  the  time  intervals,  t's,  are  functionally 
defined  as  follows*  * 

Functional  Definition 

tj_  8  time  target  entered  system  time?  of  the  beginning  of  telling 

(detection  and  identification)  the  first  early  warning  plot  for  a 

new  target  from  higher  headquarters 
(or  the  facility  simulating  it)  to 
this  system.  Recorded  on  magnetic 
voioe  tape. 

#  Superscripts  are  explained  on 
Page  j  remaining  symbols  are 
explained  on  Pago  . 
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Functional  Definition 

t2  B  time  target  entered  a  track- 
while-soan  computer* 

(data  processing) 

tj  *  time  when  track-while-scan 
computer  first  establishes 
a  smooth  traok* 

(data  processing) 


ti,  ■  time  of  first  height  informa¬ 
tion  received  by  track-while- 
ecan  computer  from  height- 
finding  radar. 

(data  proce seing) 


te  *  time  target  was  assigned 
to  a  batterv. 
(assignment; 


t6  *  time  target  first  examined 
at  battery. 

(assignment) 

t7  s  time  of  target  designation 
to  battery  tracking  radar, 
(acquisition) 

to  =  time  of  target  lock  on  by 
battery  tracking  radar, 
(tracking) 

t^  z  time  of  fire 
(engagement) 
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time?  of  first  variance  in  traok- 
while  scan  computer  IBM  output. 

1)  time?  of  first  x  punch  for 
each  target  assignment  to  a 
track-while-scan  computer, 

2)  time?  of  appearance  of  white 
light  next  to  channel  number  at 
the  left  of  the  Engagement  Status 
Board. 

1)  time^  of  height  dots  on  tactical 
display  and  background  height  report 
to  confirm  that  height  was  not 
entered  from  early  warning  infor¬ 
mation  » 

2)  time?  of  first  varlanoe  in 
track-while-soan  oomputer  recording 
output.  Funoh  out  and  background 
height  report  to  oonfirm  that 
height  was  not  entered  from  early 
warning  information. 

1)  time*  of  appearance  of  battery 
letter  on  tactical  display. 

2)  time? of  "on"  signal  in  battery 
recording  for  the  first  time  for  a 
target-battery  combination. 

time1  of  first  "on”  punch  for  each 
target-battery  combination. 


timeA  of  first  "on"  punoh  for  oach 
target-battery  combination. 

time  of  first  “on*  punch  for  each 
target-battery  combination. 

1)  time1  of  first  “on1*  punch  for 
each  target-battery  combination. 
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t10  5  time  of  missile  impact 
(engagement) 

tn  -  time  of  battery  "ready1* 
for  next  assignments 
(transfer  time) 

t10  =  time  the  next  target  is 
designated. 

(transfer  time) 
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2)  time-*  of  the  appearance  of  red 
firing  light  on  Engagement  Status 
Board  for  eaoh  target-battery  com¬ 
bination. 

time^  of  first  "on"  puneh  for  each 
target-battery  combination. 

time^  of  first  "on"  puneh  for  each 
target-battery  combination. 


timel  of  first  punch  for  a  target- 
battery  combination  which  is 
preceded  by  punches  in  any  oolumn 
referring  to  another  combination. 


ft 


4 


The  superscript  1  and  5  indicate  time  measurements  with  one  and  five 
seconds  acouraoy.  The  time  intervals  Ti  (i  ■  1  to  11 )  are  accordingly 
defined  as  follows » 

(1)  *  Tij  time  of  entry  of  eaoh  target  into  system  from  early  warning 

information  to  time  eaoh  target  1b  entered  into  a  track-while- 
soan  computer  (t2  -  t-^)  (detection  and  identification). 

(2)  *  Tgj  time  each  target  is  entered  into  a  track-while-scan  oomputer  to 

time  of  first  smooth  narrow  gate  traoking  U3  -  t2) 

(data  processing). 

(3)  *  Toj  time  of  first  smooth  narrow  gate  tracking  to  time  height  infor¬ 

mation  is  first  available  from  height  finder  for  eaoh  target 
(ti;  -  tg),  (data  processing). 

(W  *  Tm  time  height  information  is  first  available  from  height  finder 
to  time  target  is  assigned  to  a  battery  for  each  target  and  for 
any  one  target,  eaoh  battery  (t£  -  t^)  (taotioal  evaluation). 

(£)  time  a  target  is  assigned  to  a  battery  to  time  target  is  first 

5  examined  on  Battery  Commander's  PPI  for  each  target  combination 
battery  (t6  -  tj)  (assignment). 

(6)  T/j  time  target  is  first  examined  to  time  target  is  designated  to 

tracking  radar  for  each  target-battery  combination  (ty  -  tg) 
(acquisition). 

(7)  Tyj  time  target  is  assigned  to  traoking  radar  to  time  tracking  radar 

locks-on  target  for  eaoh  target-battery  combination  (tg  -  ty) 
(acquisition!. 

(8)  Taj  time  tracking  radar  locks-on  target  to  time  missile  is  "fired" 

for  each  target-battery  combination  (tp  -  tg)  (tracking). 
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(9)  Toj  time  missile  is  "fired"  to  time  of  missile  "impact"  on  target 

for  each  target-battery  combination  (t^Q  -  t^)  (engagement), 

(10)  time  of  missile  "impact" on  target  to  time  battery  is  ready  for 
reassignment  for  each  target-battery  combination  (t-n  -  tin) 
(transfer  time). 

(11)  T-jjj  time  battery  is  ready  for  reassignment  to  time  a  new  target  is 

designated  to  that  battery  for  each  battery  and  for  each  new 
target  (ti2  *  ^ll)  (transfer  time)* 


*  Currently  the  system  instrumentation  does  not  permit  explicit  separation 


of  Tn  and  To.  If  they  cannot  be  separated  implicitly,  or  through  a 

t  ■  i  I  “  t  i  i  I  »  *  _ ■  «  «  t  I  ,  A  W  A  . 


minor  instrumentation  change,  they  will  be  carried  in  the  analysis  as 
T]_  +  T2.  The  same  applies  to  T3  and  T^. 


MODEL  PARAMETER 


t6 


INSTRUMENTATION 


target  number,  traok-while-soan 
computer  number,  EW  (early  warning)  voice 
and  plots. 


*1*  ao,  Rr»  0,  n,  Rs  of  targets. 

Also  AF  target  number  and  track-while-scan 
computer  number. 


*y»  **•  fs 


Rp,  e.,  n,  Rs  of  targets, 
target  number,  traok-while-soan 
oomputer  number,  early  warning  height, 
traok-while-soan  output  (x,  y,  h,  x,  and  y) 


a,  ag,  Rj,,  0,  n,  Rs  of  targots. 

so  aF  target  number,  traok-while-soan 
computer  number,  battery  number,  order  of 
assignments  to  batteries,  correlation  time 
for  each  target,  traok-while-scan  computer 
output.  Command  and  Status  signals. 


ai,  ag,  Rp,  c,  n,  Rs  of  targets 


Also  AF  target  number,  track-while-scan 
computer  number,  battery  number,  whether  this 
is  the  1st,  2nd,  etc.  target  handled  by  a 
given  battery,  track-while-soan  computer 
output,  Command  and  Status  signals. 


&i,  ao,  Rp*  °»  n»  Rs  targets. 

Also  AF  target  number,  traok-while-soan 


computer  number,  battery  number,  whether 
this  is  the  1st,  2nd,  etc.  target  handled 
by  a  given  battery,  traok-while-soan 
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MODEL  PARAMETER  INSTRUMENTATION 


computer  output,  any  track  data  by  other 
batteries  on  target  now  being  assigned  to 
a  battery  radar  during  Tg,  Command  and 
Status  signals. 

T7  a2,  E,  c,  n,  R  of  targets. 

'  Also  aF  target  number,  track -while-scan 

computer  number,  battery  number,  whether 
this  is  the  1st,  2nd,  otc,  target  handled 
by  a  given  battery,  track-while-soan  com¬ 
puter  output,  any  track  data  by  other 
batteries  on  target  now  being  assigned  to  a 
battery  radar  during  T,  Command  and  Status 
signals • 

Tg  ai,  ao,  Rj,,  c,  n,  Rs  of  targets. 

Also  AF  target  number,  track-while-scan 
computer  (target)  number,  battery  number, 
whether  this  is  the  1st,  2nd,  etc.  target 
handled  by  a  given  battery,  track-while-scan 
computer  output,  battery  track,  Command  and 
Status  signals. 

Tj>  ai,  Rj.,  c,  n,  R8  of  targets. 

Also  AF  target  number,  track-while-soan 
computer  (target)  number,  battery  number, 
whether  this  is  the  1st,  2nd,  etc.,  target 
handled  by  a  given  battery,  track-while-soan 
computer  output,  battery  track. 

T]_q  ai,  a?,  Rj.,  c,  n,  R8  of  targets. 

Also  aF  target  number,  track-while-scan 
computer  (target*  number,  battery  number, 
whether  this  is  the  1st,  2nd,  etc.  target 
handled  by  a  given  battery,  track-while-scan 
computer  output,  battery  track. 

T^  a^,  a-,  Rr,  e,  n,  Rg,  0  of  consecutive 

targets  handled  by  a  given  battery.  Also 
AF  target  numbers,  track-while- scan  computer 
numbers,  battery  number,  track-while-soan 
computer  output. 


where 

c  ■  concentration 
n  -  number  of  targets 
Rr*  range  of  resolution 
R  ■  slant,  range  of  target  at 
initial  point  of  time 

from  a  battery  to  consecutively  interval, 
handled  targets  (plan  view). 


i,  f  ■  target  velocity  components 
x,  y,  h  *  target  position  coordinates 
*  aspect  to  battery 
ap  •  aspect  to  operations  center 
0  ■  angle  between  radius  vectors 
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(b)  Input  Variables ?  The  input  variables  constitute  those 
characteristics  ancTfeaturea  of  raids  that  go  to  determine  wholly  or  in 
part  the  effectiveness  of  an  AA  defense  system.  The  input  variables  include 
height,  velocity.,  radar  cross-section,  early  warning  information,  resolu¬ 
tion  range,  path  of  the  target,  etc.  It  is  to  be  noted  that  the  mathema¬ 
tical  relation  of  the  input  variables  to  each  of  the  model  parameters,  the 
delay  intervals  T^,  T-q,  is  of  paramount  importance  to  the  creation  of  the 
model. 

(o)  System  Configuration  Parameters t  These  include  the 
number  of  batteries,  their  location  and  relative  distance  among  them,  and 
the  number  of  operating  batteries.  Although  these  parameters  primarily 
refer  to  the  geometry  of  the  system,  they  include  weapon  characteristics 
such  as  kill  probability  curves,  maximum  and  minimum  firing  ranges,  etc. 

(d)  System  logic?  The  system  logic  essentially  describes 
how  the  system  operates  on  input  data,  what  the  operators  do,  how 
assignments  are  made,  under  what  conditions  open  fire  commences,  etc, 

The  system  logic  thus  refers  to  the  operating  nroeedures  with  respect  to 

a  fixed  set  of  input  variables)  it  contains  standard  operating  procedures, 
as  well  as  assignment  doctrines. 

(e)  Measures  of  Effectiveness  *  These  constitute  criteria 
that  give  an  explicit  measure  of  the  extent  to  which  the  defense  system  is 
attaining  its  main  objective.  The  conoept  of  measure  of  effectiveness, 

in  effect,  implies  that  the  goal  and  operations  of  the  system  are  clearly, 
significantly  and  explicitly  stated.  In  fact  the  model  in  Its  entirety 
is  built  around  the  measures  of  effectiveness  which,  in  essence,  define 
the  goal  of  the  system,  Ihese  objectives,  an  defined  by  the  criteria 
of  effectiveness,  must  be  self-consistent,  since  it  is  impossible  to  make 
consistent  fundamentally  inconsistent  goals. 

An  index  is  a  number,  a  mea sure -number ,  and  this  number 
can  hardly  be  conceived  without  criteria  by  which  the  effectiveness  of  the 
weapon  system  is  to  be  assessed.  An  index  is  a  measure-number  indicative 
of  the  effectiveness  of  the  system  with  respect  to  predetermined  criteria 
of  effectiveness,  A  Defense  Index  is  the  selected  criterion  quant i7Te5 
to  measure  the”output~of  the  defense  system,.  Thus,  there  is  no  index  with¬ 
out  selected  criteria  and  without  operational  data  (best  data).  For 
example,  if  maximum  attrition  (the  maximization  of  the  expected  number  of 
targets  destroyed)  is  the  criterion  chosen,  the  system  subjected  to  a  given 
class  of  raid,  may  have  an  index  of  0,3  with  respect  to  this  criterion. 


It  is  evident  that  the  primary  objective  of  a  defense 
system  is  to  "score"  against  enemy  planes.  Hence,  it  follows  that  the 
statistical  distribution  of  planes  destroyed  by  the  system  would  yield  all 
the  information  needed  to  assess  the  capability  and  efficacy  of  the  system 
against  enemy  targets.  Such  a  distribution  will  contain  the  expected 
number  of  targets  destroyed  (E),  the  probability  of  non-peretration  (P^p), 
i.e.,  the  probability  that  ail  the  planes  in  the  raid  are  destroyed,  tne 
probability  that.,  at  most,  a  specified  number  of  Planes  survive,  etc. 
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The  nathematical  model  envisaged  cannot  be  committed  exclusively  to  the 


criteria  of  maximizing  P__  or  E,  the  concepts  of  maximum  defense  and 


maximum  attrition  respectively.  In  fact,  it  is  desirable  to,  devise  a 
more  general  clZss  of  oriteria,  (in  view  of  the  advisability  of  considering 
all  possible  enemy  strategies  containing  maximum  attrition  (E)  and  P„p  as 
"lindting"  criteria.  Without  unduly  belaboring  the  point,  it  is  worth  noting 
that  realistic  situations  may  change  to  the  extent  of  requiring  the 
maximization  of  expected  number  destroyed  and,  under  varying  conditions, 
the  maximization  of  Pnp,  especially  if  the  damage  to  the  defended  area  is 
catastrophic  if  one  or  several  planes  penetrate  the  defenses. 


In  short,  given  a  kill  probability  density  function,  a 
damage  function  of  the  number  of  enemy  targets  penetrating  the  defended 
area,  it  cannot  be  stated  categorically  that  damage  to  the  defended  area 
will  be  minimized  by  maximizing  either  E  or  Pnp.  In  order  to  minimize 
the  expected  damage  to  a  defended  area,  the  entire  distribution  of  the 
number  of  planes  surviving  (or  destroyed)  and  not  merely  Pnp  or  E  (T), 
the  expected  number  of  targets  destroyed,  needs  to  be  superimposed  on  the 
appropriate  damage  function. 


To  return  to  the  conceptual  scheme » 


input  -  system  -  output 


We  can  see  that  equipment,  system  logic,  and  system  configuration 
constitute  the  system  and  its  operation.  The  output  is  given  in  terms 
of  the  multinomial  distribution  (Pi),  the  probability  that  exactly  i 
targets  are  destroyed  in  a  raid,  i  •  o,  ...n,  where  n  is  the  number  of 
attacking  aircraft)  jP±j  5  (P0,  ..Pi,  ...Pn). 


*n  “  pnp  s  the  probability  of  non-penetration. 


This  distribution,  together  with  the  criteria  of  effectiveness,  will 
determine  the  desired  output  (with  respect  to  the  aeleoted  criterion). 


To  summarise >  The  mathematical  model  consists  of  the 
(l)  model  parameters.  (2)  the  input  variables,  (3)  system  configurations, 
(4)  system  logic,  (f>)  and  measures  of  effectiveness. 


c.  The  Monte  Carlo  Technique t 


With  the  multinomial  distribution,  Pj_,  as  the  primary  output  of 
the  Model,  the  question  naturally  arises  how  this  distribution  is  calcu¬ 
lated.  It  would  indeed  be  desirable  to  determine  analytically  the  exact 
distribution  of  the  planes  destroyed.  At  this  stage,  however,  this  goal 
is  well  night  impossible  of  attainment.  It  should  be  noted  that  this 
Model  is  a  probabilistic  one  because  of  the  stochastic  nature  of  the 
model  parameters  and  weapon  characteristics.  The  Monte  Carlo  method  is 
eminently  suited  to  estimate  the  distribution,  since  this  method  is 
essentially  a  sampling  experiment  making  use  of  large  tables  of  random 
numbers.  The  term  "Monte  Carlo"  is  descriptive  of  a  whole  class  of 
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calculational  techniques  called  stochastic  because  of  the  use  of  random 
numbers . 

The  aim  of  this  technique  is  to  find  a  stochastic  process  that 
has  a  distribution  corresponding  to  the  physical  situation  under  inves¬ 
tigation.  (Strictly  speaking,  this  method  cannot  yield  the  entire 
distribution.) 

▲  high  speed  digital  computer  is  utilized  to  implement  the 
substitution  of  a  stochastic  procedure  for  an  analytic  model  of  the  system. 
What  the  computer  is  actually  doing  is  to  sample  from  the  exact  distribution 
in  order  to  estimate  it.  Ihe  exact  distribution  of  the  real  situation 
is  approached  mere  and  more  closely  as  more  runs  are  made  on  the  oomputer 
(this  is  based  on  the  law  of  large  numbers)*  Representative  samples  are 
being  followed  through  their  histories  to  obtain  an  approximation  to  the 
entire  distribution. 


The  computer  samples  in  a  random  manner  from  eaoh  of  eleven 
time-delay  (Tj)  distributions:  For  a  given  raid  configuration  consisting, 
say,  of  n  air or aft,  one  value  of  eaoh  T.  is  obtained  for  eaoh  target. 

Thus  corresponding  to  target  A^,  T^  up  to  are  obtained  (some  T's  may 
be  zero  if  the  target  fails  to  transit  the  entire  astern). 

A1  1  T1  •  •  •  T11 

*n  *  T1  •  •  •  ^ll 

A  given  raid  will  be  rerun  between  £0  to  100  times.  These  rune  will 
produce  samples  of  Tj  to  Tv,  (inclusive).  The  factorial  design  consists 
of  96  blocks  with  four  replications  in  each  block  making  a  total  of  381; 
data. 

The  faster  the  computer  program,  the  larger  the  sample  size, 
the  narrower  will  be  confidence  intervals  for  the  estimated  distribution 
parameters.  The  sampling  distribution  is  multinomial  with  the  following 
parameters!  PQ  P1  ...,  V  where  is  the  probability  that  exactly 

i  planes  are  destroyed,  n  is  the  number  of  planes  in  the  raid,  A  finite 
number  of  raids  will  be  selected  to  facilitate  the  correspondence  of  the 
response  surface  of  the  system  (in  terms  of  kill  probabilities)  to  the 
multidimensional  space  of  input  variables. 


(10  The  Nature  and  Effloaoy  of  the  Modol: 

This  is  a  stochastic  (probabilistic)  model,  since  distributions 
of  the  model  parameters  (the  time  delays)  are  involved.  Corresponding  to 
a  sample  of  size  N,  of  each  Tj.,  there  is  a  regression  equation! 

Ti  *  Ti  (v,  h,  Rs,  c,  n,  Rr,  a) 

The  parameters  v,  h,  etc.  are  randomly  varied  in  order  to  obtain  samples. 

Tl#  aay,  may  be  given  asi 

Tx  -  .573  v  +  .672  h  +  5.731; 

The  mean  of  each  Tj  is  estimated  for  given  values  of  the  parameters. 


m.  m  da 


•  ,  S  v  •  W  *  '  U  ■> 


Design  of  Experiments 


146 


Corresponding  to  each  raid  sample  of  size  N,  there  will 
result  frequencies  for  PQ*  P},  ...  Pn.  These  constitute  the  response 
surfaoe  (with  the  aid  of  interpolation)  of  the  system,  with  respeot  to 
a  raid  of  a  specific  type. 

A  raid  configuration  is  characterized  by  the  input  values 
of  (v,  h,  RB,  c,  n,  Rr.  a).  A  set  of  values,  one  for  each  parameter, 

(v,  h,  R8,  c,  n,  Ry,  a)  is  defined  as  a  raid  vector*  This  is  the  input 
vector. 

There  are  N  samples  of  each  input  vector  and  consequently, 
at  most,  N  values  of  po>  h,  •  •  •  pn,  for  each  vector.  Thus,  there  is  a 
one-to-one  mapping  from  each  raid  (Rj)  to  its  corresponding  response 
surface  * 

Rj  - »  (P1)j 

(v,  h,  Rs,  c,  n,  Ry,  a)j  ^  (P0,  ...  Pn)j 

Rj  =  (v,  h,  Ra,  c,  n,  Rr,  a)j 

'The  ultimate  goal  is  to  find  explioit  expressions  of  the 
output  response  surface  (P0,  Pi  ...  Pn)  in  terms  of  speoifio  raid  inputs 
where t 

pi  "  pi  (v,  h,  Ra,  o,  n,  Ry,  a) 

The  aim  is  to  obtain  a  regression  equation  of  kill  proba¬ 
bilities  in  terms  of  height,  vslooity,  range,  number  of  targets  in  a  raid, 
etc.  This  is  possible  only  if  a  class  of  admissible,  raids  is  treated  as. 
one  ensemble. 

I4,  Summary  and  Conclusions » 

a.  The  taitial  design  of  experiments  established  the  conceptual 
framework  around  which  the  model  was  derived  and  the  test  designed.  It 
stated  the  test  objectives,  the  test  criteria  and  the  generally  anticipated 
results. 

b.  A  stochastic  (probabilistic)  model  of  a  weapon  system  wan  con¬ 
structed  to  describe  and  predict  the  (time  response)  output  character¬ 
istics  of  the  system  for  given  inputs  (raid  configurations).  This  was 
accomplished  by  partitioning  the  overall  functions  of  the  system  into 
subfunctions  and  their  corresponding  time  delays,  the  model  parameters, 
and  finding  mathematical  relation  of  each  time  delay  in  terms  of  input 
parameters  (the  raid  characteristics).  The  model  parameters  are  estimated 
by  the  Monte  Carlo  method  whioh  is  essentially  a  combination  of  numerical 
analysis  and  sampling  theory. 

The  model  contains  a  fixed  physical  system,  and  assign¬ 
ment  procedure,  weapon  characteristics,  and  a  standard  operating  procedure. 
The  primary  output  of  the  model  is  the  probability  distribution  of  targets 
destroyed  for  a  class  of  admissible  raids.  This  distribution  yields  the 
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probability  that  exactly  i  targets  out  of  a  raid  of  n  attacking  targets 
are  destroyed  for  each  i  n.  All  possible  effectiveness  criteria  are 
expressed  in  terms  of  the  primary  output. 

the  model  contains  a  flow  chart  of  a  computer  program 
which  can  be  coded  for  any  computing  machine,  so  that,  given  the  character¬ 
istics  of  a  given  raid  and  an  assignment  prooedure,  the  corresponding 
system  response,  in  terms  of  kill  probability,  can  be  computed. 

The  model  is  flexible!  as  the  system  configuration  para¬ 
meters  change,  the  distributions  of  the  time  intervals  change  accordingly. 
It  is  thus  possible  to  gain  insight  into  ways  of  improving  the  system. 

It  is  to  be  noted  that  these  parameters  Include  kill  probability  ourves. 

The  model  will  make  it  possible  to  predict  the  behavior  of  the  system 
when  one  (or  several)  of  the  time  intervals  are  increased  or  decreased. 
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Redstone  Arsenal 

Often  times  in  rooks try  it  is  necessary  to  conduct  environmental 
tests  on  newly  developed  rounds.  For  example,  lt  is  important  to  investi¬ 
gate  the  effects  of  high  humidity  on  the  time  required  for  a  certain  type 
of  rocket  to  travel  1000  yards.  In  addition  to  determining  whether  or  not 
high  humidity  affects  the  mean  time  to  1000  yards,  it  is  very  important  to 
know  how  the  variance  of  this  time  is  affeoted. 

In  the  early  stages  of  testing  the  desired  experiment  is  a  very  simple 
one.  A  certain  number  of  oontrol  rounds  (exposed  to  a  standard  humidity) 
and  a  number  of  treated  rounds  of  the  same  type  (exposed  to  high  humidity) 
are  to  be  fired. 


The  foremost  problem  confronting  the  engineer  is  that  of  determining 
how  many  rounds  he  should  fire  in  order  to  obtain  reliable  answers..  He 
desires  a  test  that  will  closely  predict  behavior  of  the  entire  population 
of  rockets ,  but  at  the  same  time  he  can  afford  to  test  only  the  absolute 
minimum  number  needed  to  obtain  the  required  precision  of  results. 

It  will  be  assumed  that  samples  large  enough  to  compare  two  variances 
are  adequate  for  comparing  the  two  means. 

A  well-known  method  of  determining  sample  sizes  for  comparing  two 
variances  with  specified  a,  0,  and  ratio  of  o?  to  a*  has  been  developed. 

Let  o?  be  the  true  variance  of  the  treated  rounds,  and  lot  o§  be  the  true 
variance  of  the  oontrol  rounds.  Based  on  requirements  of  the  parameter 
in  question,  it  is  possible  to  seleot  a  yalue  that  is  aotually  not  accept¬ 
able  for  the  ratio  of  o?  to  o|  such  that  the  probability  of  aooepting  Hi 
o?  <  os  is  0.  Let  us  define  this  value  as  A2.  Now  if  a?  <  os,  the  engineer 
wlnls  to  acoept  the  treated  rounds.  If  o2>o|,  he  wants  to  reject  the 
treated  rounds  and  redesign,  Qiven  a  andp,  the  theory  for  obtaining  sample 
sizes  is  as  follows: 


The  null  and  alternate  hypotheses  are 
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2  2 

where  s.  has  y -  degrees  of  freedom,  and  s.  has  degrees  of  freedom. 

Using  an  P-Table,  y -  and kU  and,  hence,  n.4and  a,  are  found  by  trial  and 
error.  Curves  have  oeen  obtained  for  Uq1"  k  n.Dy  plotting  sample  size 
against  for  several  values  of  a  and  0  and  several  values  of  k.  Where 
a  and  0  are  not  the  same,  the  curves  have  been  constructed  with  0  less 
than  a,  This  was  done  because  it  is  very  frequently  true  in  rocketry 
that  the  error  of  accepting  bad  rounds  (those  having  high  variance)  is 
more  costly  than  the  error  or  rejecting  good  rounds.  Notice  that  n.  *  n2 
will  produce  a  smaller  total  sample  size  than  any  other  combination1.  * 

Two  different  applications  of  choosing  sample  sizes  are 


1.  Suppose  of  is 
is  willing  to  let  B  be 
than  0.05.  Also,  he  would  like  to 
as  treated  rounds.  To  do  this,  he 
a  -  0.20,  0  «  0,05  (ru  will 
type  of  problem) .  Then  for 


chosen  four  times  greater  than  of.  The  engineer 
as  high  as  0.20  but  desires  0  to  be  no  higher 

fire  three  times  as  many  oontrol  rounds 
should  refer  to  the  curve  for  n„  *  3n, , 
be  the  number  of  treated  rounda*in  this 
the  curve  gives  n»  *  12,  and 


n«  ■  36.  Curves  are  not  yet  Available  for  k  *  1/2,  2,  1/4,  and  4,  but 
linear  interpolation  between  two  curves  will  suffice  for  those  cases  * 

2.  In  setting  up  on  environmental  test,  it  is  desired  to  compare  means 
by  testing  for  a  source  of  variation,  0?,  among  the  batches  of  rockets  making 
up  the  control  rounds.  Assuming  that  a°batch  consists  of  m  rounds,  it  is 
necessary  to  find  the  number,  b,  of  batches.  The  analysis  of  variance  will 
be  of  the  form 


Source 

Between  batches 
Within  batches 
Total 


e(ms) 


+  ao? 


The  hypotheses  are 


H  i  of  -  0 

O  D 


H1  !  CTb  “  Ye’  Y^° 
Under  H 


P'-5->P1-  a 

s_ 


Under  H, 


2  / 
r/ 


B'  e 


>  K 


o  *  mo. 
e  b 


TO. 


1  -  0 


,-.v 

Iga  m  m  -•  '-df- ■  -'6  - . 


1  s  vS  VV4.| •-  '*  * 


V  ^ 

to 


o.iw  ■ 


f  ■  W  W  >•  W  W-  W  W  -W  w  . 1W1 

.‘V/ to-? -to.  v •  ^tototovtoto 

’■tov-'to 


156 


Design  of  Experiments 


y  is  to  be  chosen  in  a  manner  similar  to  that  by  which  X  was  chosen  in 
the  preceding  illustration.  Now 


B(ajjj) 

SUjT 


•  1  ♦  mr 


An  approximation  for  k  in  the  relation  Rj  ■  kn^  1®  S^ven  by 


k  ■  *" m  -  1 

where  b  •  n*.  ,  Haying  found  X2  and  k,  and  hiving  chosen  a  and  0,  b  -  nx 
can  be  found  from  the  ourves  at  the  end  of  this  paper. 

To  give  a  specific  illustration  of  this  type  of  problem,  suppose  the 
engineer  ohooses  o?  ■  2c r,  That  is,  Y  ■  2,  If,  for  example,  he  chooses 
m  -  4  rockets  per  Batch  "this  value  is  arbitrary),  he  obtains 
X2  -  1  ♦  U)  (2)  ■  9,  and  kc*3.  Also,  suppose  he  chooses  a  •  0.20, 

0  m  0.01.  Then  referring  to  the  curve  for  n«  ■  3n.,  a  -  0,20,  3  ■  0*01, 
he  finds  that  for  X2  -  9,  b  ■  ■  8. 

The  reverse  procedure  of  finding  m  for  a  particular  value  of  b  can  be 
accomplished  by  trial  and  error.  That  li,  given  specified  values  of  a,  0, 
and  Y,  different  values  of  m  oan  be  selected  until  the  desired  value  of  b 
is  obtained.  The  curves  given  here  are  useful  up  to  values  of  m  ■  6. 
Tables  8.3  and  8.4  in  Reference  1  may  be  used  for  larger  values  of  m  and 
smaller  values  of  a. 

For  additional  discussion  of  these  two  types  of  problems,  along  with 
operating  characteristic  curves  of  the  F-test,  see  Reference  2. 
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RECOMMENDATIONS  FOR  THE  DESIGN  OF  EXPERIMENTS 
FOR  ESTIMATING  QUADRATIC  REGRESSION 


Pvt.  Paul  G.  Sandora 
Redstone  Arsenal 


INTRODUCTION.  The  following  problem  ia  treated*  A  total  of  N  observa¬ 
tions,  y1,  y2,  ...  ,  yN,  may  be  taken  at  any  locations,  in  the  range 
to  Xg*  The  y^  are  uneorrelated  and  have  common  variance  VQ,  The  relation 
between  y  and  x  is 


E(y, )  ■  a  +  bx.  +  oxf 


where  E  (  )  stands  for  the  expectation  or  mean  value  of  the  symbol  in 
brackets.  It  is  desired  to  select  values  of  x  at  whioh  to  observe  y  so 
that  certain  speoific  questions  about  the  relation,  Eq.  1,  may  be  answered 
as  efficiently  as  possible*  Best  spaoings  of  the  x  are  given  for  the 
following  situations: 

1.  Interpolation,  to  minimize  the  maximum  standared  error  of  the 
estimate  y(x)  in  the  range  x^  to  Xg. 

2.  Extrapolation,  to  minimize  the  standard  error  of  jKxq)  for  some 


*0  >  *3* 


3* "  Testing  Hqi  o  ■  0* 

The  situation  described  is  one  frequently  encountered  by  experimen¬ 
ters  in  engineering  or  laboratory  work.  The  variable  x  often  represents 
pressure  or  temperature,  with  limits  xL  and  Xg  prescribed  by  equipment 
restrictions. 

The  estimation  of  the  constants,  a,  b,  and  o,  Is  made  by  least 
squares  methods  which  provide  expressions  for  the  standard  errors,  eaoh  a 


function  of  the  x's  selected,  that  are  needed  in  answering  situations  1,  2, 


and  3.  The  least  squares  estimates  are  denoted  by  the  symbol  • 

The  recommended  spaoings  (set  of  x5s  for  an  experiment)  depend  upon 
a  result  of  Garza1  which  implies,  for  our  problem,  that  exactly  three  dis¬ 
tinct  values  of  x  will  suffice  for  any  problem  like  those  above.  Thus,  all 
of  the  best  spacings  consist  of  three  values’*  x^,  x2,  and  x^,  satisfying 
XL  i  X1A  Xj<  Xg  with,  n1,  n2,  and  n^  (£  n^  =  N)  observations  of  y  at 
the  corresponding  values  of  x.  Values  of^n^  will  generally  not  be  integers; 
care  must  be  taken  in  rounding  calculated  spacings  pff  to  integer  values. 
For  small  N,  a  fine- structure  study  may  be  required. 


I’ 
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This  result  simplifies  both  the  problem  of  selecting  a  spacing  and  the 
actual  calculation  of  the  constants  in  a  given  experiment.  Denote  by  y..  the 
arithmetic  mean  of  the  n^  measurements  of  y  and  x^.  Then  it  o&n  be  shown 
that  the  least  squares  estimate  of  Eq.  1  passes  through  the  three  points 
(Xj,  £j)«  j  *  1»  2,  3.  Thus  the  least  squares  estimate  can  be  written  in 
the  Lagrange  form 


y(x) 


(x  -  Xg)(x  -  x^) 


(x  -  x-)(x  -  X.) 


U1  -  XgJ 


lX1  -  Xy 


U',  -  x. 


(X2  -  Xj; 


+  (x  -  x^)(x  -  Xg) 

(Xj  -  XjKXj  -  Xgj 


which  may  be  written 


■  21  r,<x)y. 
j  -  1  3  3 

with  an  obvious  notation!  We  have  at  once 

?<*>]  ■  v0  ^  [V*>J2  L 


v«  [?<*)] .  v0  ir  [r3<x)J2  i_  (4) 

3"  1  °3 

We  now  proceed  to  oonsider  the  three  problems  separately. 

INTERPOLATION.  In  this  oase,  we  wish  to  know  Eq.  1  as  well  as  pos¬ 
sible  over  the  range  x^  to  x^  This  may  be  done  by  finding  a  apaoing  which 
minimizes  the  maximum  Var  £y(x)J  ■  V  for  x^<  x^  Xg*  A  simple  deriva¬ 
tion  of  this  spacing  was  given  by  Garza1 .  Note  from  Eq.  4  that 

Tar 

Then 

3  frivr* 

The  minimum  value  of  1/n.  constrained  by  *  N  is  9/N  with 

n,-«/3.  J-l  J  j 


Hence, 


3V0 

N  -  min  Vmax 


,  .  aV.V.VV  ■ ‘-N>- AA 


VV-V-\v 
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Now  if  the  symmetric  spacing  x^  »  x^i  x 2  =  (x^  +  Xg )/2,  x^  «  Xg,  and 

■  nj  ■  N/J  is  used,  Eq.  4  has  a  differentieblemaximum  at  x.,  equal 

to  3Vq/M.  At  both  x^  and  Xg,  Eq.  4  is  inoreasing  as  the  end  of  the  inter¬ 
val  is  approached  from  within,  and  Eq,  4  is  equal  to  JV^/N  at  the  ends. 
Therefore,  Vfflax  «  Hence*  the  spacing  is  the  desired  one  since  it 

produces  the  equality  of  Eq.  6.  Thus  the  best  spacing  for  interpolation 
is  to  take  N/3  observations  at  each  of  x^,  (x^  +  Xg)/2,  and  Xg. 

The  problem  is  solved,  but  a  few  comments  are  in  order.  An  important 

objection  to  the  "best"  spaoing  is  that  it  offers  no  information  about  in¬ 

adequacies  in  the  niodel,  Eq.  1.  Thus  we  wish  to  examine  the  sensitivity  of 
the  best  spacing  to  variations  which  allow  tests  of  the  adequaoy  of  the 
quadratic  model.  Consider  the  spaoings  S1  and  Sg,  where  x^  ■  -1,  Xg  ■  1  are 
used  for  simplicity  and  p^  ■  n^/N. 

p  1/4  1/S  1/4  1/8  1/4 

8.  I - 1 - 1 - 1 - : - H 

1  x  -1  -1/2  0  1/2  1 

p  1/8  1/8  1/8  1/8  1/8  1/8  1/8  1/8 

3  I - ! - 1 - f- - h - 1 - — I - H - 

2  x  -1  -5/7  -3/7  -1/7  1/7  5/7  5/7  1 

The  ratio  of  the  standard  errors  of  the  two  spaoings  to  the  standard  error 
of  the  best  spaoing  is  shown  for  several  values  of  x.  (Standard 
error  -  Jvar  [y(x)J  ) 

x  0  -0.2  -0.4  -0.6  -0.8  -1,0 

a1  0.93  0.93  0.95  0.9?  1.05  1.11 

S2  0.8?  0.87  0.89  0.97  1.20  1.37 

It  appears  that  compares  quite  well  with  the  best  spacing,  but  Sg  is 
exceedingly  weak  at  the  ends.  This  indicates  that  for  large  N  the  often- 
used  praotice  of  taking  observations  an  equal  distance  apart  saorifioes 
muoh  acouraoy  at  the  end  points. 

In  oonolusion,  the  recommended  procedure  is  to  use  a  spacing  which 

will  allow  detection  of  inadequacies  in  the  quadratic  model  but  which  does 

not  greatly  increase  V  • 

max 


.•  * 


•• 
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EXTRAPOLATION •  If  the  experiment  is  performed  to  determine  the  estimate 
^(Xq)  for  Xq  outside  the  interval  (x^,  x^) ,  the  problem  is  one  of  extrapola 
tion.  We' shall  assume  that  xQ  >Xg,  but  the  solution  for  x^<  x^.oan  be 
obtained  easily  from  the  Solution  for  xQ  >  Xg. 

A  brief  justification  is  given  for  the  results.  Equation  2  beoomes 


y(xrt) 


where 


Tt  <*o> 


Then  tt.,  whloh  minimize  Eq.  8,  are 

“1  .  Ifil 


The  values  of  x  can  be  shown  to  be,  again 

X^  ■  X^f  X2  *  Sl— Xj  ■  Xg  (10) 

Thus,  to  minimize  Eq.  7,  is  determined  frpm  Eq.  9  with  x^  given  .by  10* 
As  an  example,  suppose  x^  ■  x^  ■  10,  ,Xj  ■  Xg  m  20,  Xg  ■  15,  and 
xn  •  25*  Than 


-  0.143N 
n2  »  0.428N 
n^  ■  0.429N 
Then,  from  Eq.  8  ^ 

Var  [y(xQ)J  « 

If  we  want  this  equal  to,  say  the  varianoe  of  a  single  observation,  Vq, 
then  N  •  49  observations  will  be  required.  This  number  seems  quite  large 
when  it  is  remembered  that  the  same  precision  can  be  attained  from  x^  to 
Xg  with  just  three  observations. 


. . 
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In  summary,  in  planning  an  experiment  where  extrapolation  la  una¬ 
voidable,  even  the  beat  spacing  often  requires  a  large  number  of  obser¬ 
vational  simple  calculations  like  these  should  be  made  to  determine  before¬ 
hand  what  may  be  expeoted  from  proposed  extrapolation* 

TESTING  THE  HYPOTHESIS  ABOUT  0,  For  large  xn  Var  /yW)!  approaches 
its  dominant  term,  Var  (c)Xq.  Hence,  for  large  xQ,  when  Var£y(x0)J  is 
minimized,  Var  (0)  is  also  miniated.  lotting  x  grow  large  in  Eq.  8.  we 
find 


“a-* I*  “5-*f 


(u) 

This  is  the  spacing  that  minimizes  Var  (0 ) |  henoe  it  yields  the  best  test 

for  hypotheses  about  o*  The  minimum  value  of  Var  (o')  is 

64V. 


Var  (0) 


-  xL) 


(12) 


SUMMARY.  Best  spaoings  have  been  given  for  three  common  situations  in 
quadratic  regression*  The  spaoings  are  often  different  from  those  oommqnly 
used*  They  depend  on  obtaining  several  independent  observation a  at  the 

"  <  t 

same  value  of  x*  Where  the  cost  of  an  observation  is  independent  of  x, 
these  spaoings  are  minimum  cost  spaoings*  More  general  considerations  in 
the  design  of  regression  experiments  are  found  in  References  1,  2,  and  3* 
Reference  4  gives  best  spaoings  for  estimating  straight  lines*  Reference  5 
gives  some  of  these  results  together  with  a  disousslon  of  rate  of  subsampling* 
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A  WIDE  BAND  TELEMETERINO  SYSTEM 


R.  A.  Parkhurst 

Diamond  Ordnance  Fuze  Laboratories 

In  making  ohaff  reflection  studies  several  methods  have  been  employed. 

As  examples^  one  method  synthesises  chaff  by  using  randomly  spaced  pins  in 
waveguide.  Another  involves  dropping  ohaff  piece  by  piece  in  still  air, 
making  refleetion  measurements  and  integrating  all  such  data  into  the 
composite  signal  which  would  occur  if  all  pieoes  were  dropped  simultaneously* 

Reflection  studies  are  not  only  difficult  due  to  the  problem  of  synthe¬ 
sising  chaff  acho  signals  but  are  also  further  complicated  by  the  type  of 
signal  being  reflected.  If  a  oloud  of  chaff  is  in  the  air  and  a  Pulse  radar 
beam  is  swept  through  it;  the  echo  amplitude  and  stretohing  will  be  one 
amount  when  the  beam  is  aimed  directly  at  the  ohaff,  but  when  the  beam 
strikes  only  the  side  of  the  chaff  oloud,  the  stretching  and  amplitude  will 
be  of  a  different  value. 

In  the  event  that  cw  is  used  instead  of  pulses  the  reflection  will 
vary  with  respeot  to  the  position  of  the  ohaff  in  the  antenna  pattern,  or 
the  rate  at  which  the  chaff  is  passing  through  the  pattern. 

To  synthesize  such  conditions  in  the  laboratory  is  quite  difficult, 
if  not  impossible,  and  mathematical  analyses  become  so  complex  that  they 
produce  little  more  than  very  general  results. 

Two  methods  are  available  for  obtaining  genuine  reflections  from 
chaff.  One  of  these  is  to  use  a  supersonic  sled  faoility,  several  of 
which  are  available  at  test  stations  throughout  the  country.  In  this  type 
of  test  a  .fuze  is  mounted  on  a  sled  which  1ft  driven  by  rocket  motors 
and  reaohes  speeds  up  to  2000  feet  per  second.  Various  targets  may  be 
plaoed  along  the  side  of  the  track  and  signals  from  them  will  be  tele¬ 
metered  to  the  receiving  station.  For  chaff  studies,  chaff  may  be  dis¬ 
pensed  over  the  traok,  either  by  airoraft  or  by  mortar  shell. 

•I. 

Several  complications  are  involved  in  this  type  of  test.  The  main 
one  ia  that  the  fuze  must  be  made  insensitive  to  ground  eohos.  This 
requires  altering  the  fuze  radiation  pattern  on  the  side  towards  the 
ground  which  may  oauee  distortion  of  the  signal  either  by  general 
mishaping  of  the  pattern  or  by  producing  erroneous  signals  in  the  fuze 
due  to  the  lmperfeotion  of  the  shielding  material  used.  Multiple 
reflections  generated  between  the  ohaff  and  the  ground  may  also  be 
present,  and  these  oan  also  oreate  errors  in  the  data  obtained. 

The  most  realistic  method  of  obtaining  chaff  reflections  is  to  fire 
a  test  vehicle  past  chaff  in  the  sky.  Tests  of  this  nature  have  been 
performed  and  good  results  have  been  obtained.  These  tests  were  set  up 
for  a  specific  purpose}  namely,  chaff  reflections  as  seen  by  one  type  of 
fuze,  and  so  the  data  obtained  generally  is  applicable  to  only  one  type  of 
fuzing.  The  method  used  in  setting  up  and  performing  these  tests  is  quite 
similar  to  a  regular  missile  flight  test  except  that  special  fuze  tele¬ 
metering  is  employed,  and  a  drone  with  a  ohaff  dispenser  is  used. 
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Figure  f  is  a  diagram  of  a  typical  arrangement  for  a  chaff  test.  The 
shaded  area  is  the  ocean  firing  range.  The  control  stations,  telemetering 
ground  stations,  and  landing  fields  are  located  on  the  land  at  the  right. 

The  target  plane  flies  the  marked  course.  It's  air  speed  is  310  feet  per 
seeond  and  it  carries  what  may  be  called  a  standard  chaff  dispenser. 

The  launch  aircraft  follows  a  similar  course  and  thus  makes  a  tail  o  - 
approach  to  the  target*  Its  velocity  is  up  to  660>ft./seo,  and  the  timing 
check  points  are  to,  assure  proper  location  of  the  planes  during  the  test. 

The  plane  locations  are  plotted  on  radar  plotting  boards  at  the  control 
station,  and  if  either  the  drone  of  laupoher  is  not  at  the  prescribed' 
point  at  the  proper  time,  corrective  directions  are  given  to  bring  them  baok 
on  course.  Both  planes  fly  at  8000  feet  altitude,  and  the  missile  is 
launched  at  the  point/  marked  0  time,  about  3, $00  yards  from  the  target. 

...  ,..'y  •  I  *  .  .'■■  '  ••  -  f‘.  '  •  •••  '  •••  -  ...I  ,1  'lit  MU- 

A  ctmsra  plane  flies  2,000  feet  to  the  left  of  the  launch  eiroraft  and 
slightly  down  and  to  the  rear*  This  plane  carries  oameras  which  are  bore-*' 
sighted  with  the  planets  guns,  thus  being  timed  by  the  gunsighta.  Other 
camera*  take  pictures  through  the  windshield,  and  in  some  cases  hand  held 
oameras  are  used  for  extra  coverage. 

The  launeh  airoreft  also  has  several  cameras.  One  hae  a  telephoto 
lane  for  olose  up  picture*  of  the  intercept.  One  is  boresighted  to  the 
launcher  on  the  plane  and  sees  the  missile  end  target  with  rsepsot  to  that 
angle.  A  third  oovere  the  operation  through  the  windshield.  -  > 

When  a  test  ie  performed,  the  drone,  launcher,  camera  plane,  and  other 
tent  aircraft  take  off  at  -3P  minutes.  A  154  o heck  ie  mad*  with  the  ground 
station,  end  a  dry  run. is  mads  against  the  targst.  Ths  plants  are  then 
repositioned  and  the  test  proceeds.  Positions  and  direotion  are  called  out 
by  ths  radar  control  station.  At  -1  minute  a  final  position  cheok  Is  mads 
and  if  all  well,  all  operation  personnel  are  notified,  At  -6  seconds  all 
IN  equipment  is  started  in  the  ground  station,  At  -3  seconds  the  photo 
plan*  earners*  ere  started  and  ths  ohaff  dispenser  in  ths  drone  ie  turned  on. 
At  -l.$  aeoonds  the  launch  plane  oamarae  are  started  and  at  0  time  tha  pilot 
fires  the  missile.  At  +1$  saoonda  the  photo  plane  oameras  atop  end  at  +2$ 
seconds  the  launoher  oameras  atop.  A  post  launoh  oheck  is  made  with  ths 
launch  plans  making  runs  against  another  drone  with  a  pilot  in  it  for  radar 
calibration  purposes.  Camera*  are  also  placed  in  wing  pods  on  the  dronee 
to  obtain  more  precise  intercept  data. 

The  physical  execution  of  the  test  is  only  pert  of  the  entire 
operation.  Suooess  or  failure  depends  on  euooesaful  operation  of  eiroraft 
and  mieslle,  skillful  performance  of  operating  personnel,  and  proper  opera¬ 
tion  of  a  rather  oomplex  telemetering  and  reoordlng  system. 

In  order  to  obtain  signals  from  olouds  of  ohaff,  a  test  vehicle  with 
e  fuss  must  be  launohed  and  guided  through  the  ohaff  dispensed  by  the  target 
eiroraft.  As  ths  missile  passes  by  the  ohaff  and  target,  any  signal  • 
delivered  by  the  fuse  reoeiver  is  fed  to  the  telemeterine  system.  Thus, 
on  one  test,  echos  from  several  olouds  of  chaff  and  one  target  are  obtained. 


Figures  ftppear  at  end  of  the  article 
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A  special  telemetering  system  is  used  which  transmits  the  signals  to  the 
ground  station.  Calculations  have  shown  that  frequency  components  from 
do  to  100  kc  may  he  present  and  that  the  TM  system  should  have  a  bandwidth 
as  wide  as  this.  Not  only  must  this  relatively  large  bandwidth  be  accommo^ 
dated,  but  the  airborne  portion  of  the  system  must  be  able  to  wi.  stand 
violent  vibration.  The  transmitter,  when  subjected  to  vibration  equivalent 
to  that  expeoted  in  flight,  must  not  fail  mechanioally,  and  it  must  not 
generate  spurious  noise  which  oould  be  confused  with  the  desired  signals. 

In  order  to  meet  these  telemetering  requirements  two  approaches  were 
possible.  One,  to  develop  an  entire  system  which  fully  met  the  require¬ 
ments,  was  rejected  as  being  too  time  consuming  and  expensive.  The  other 
was  to  select  and  use  any  commercially  available  equipment  whioh  most  nearly 
met  the  needs. 

Figure  2  shows  a  standard  multichannel  FM-FM  telemetering  system  which 
was  investigated  for  usable  oomponents  and  techniques.  Ibis  system  consists 
of  a  crystal  controlled  VHP  transmitter  which  is  either  phase-modulated  or 
frequency-modulated  by  subcarriers.  The  transmitter  accepts  modulation  up 
to  100  kc,  and  the  suboarriers  have  various  frequencies  ranging  from  a  few 
hundred  oyeles  up  to  70  kc.  Each  suboarriar  is  frequency  modulated  with  a 
signal  whioh  is  to  be  telemetered.  The  subcarrier  frequencies  are  chosen 
so  that  then  each  is  deviated  ♦  1$%  from  its  center  frequency,  none  of  the 
modulation  sidebands  will  interfere  with  the  suboarriers  adjacent  to  it  in 
the  spectrum.  Also,  any  harmonics  generated  by  non  sinusoidal  suboarriar 
oscillators  are  filtered  out  before  applying  the  signal  to  the  transmitter 
in  order  to  avoid  interferenoa. 

The  ground  station  has  a  VHP  receiver  whioh  is  tuned  to  the  transmitter 
frequency.  The  output  of  thia  reoelver  is  fed  Into  a  bank  of  filters  and 
disorlminators  so  that  each  subcarrisr  is  separated  out  and  fed  to  a  dis¬ 
criminator  tuned  to  ita  own  oenter  frequency.  The  discriminator  demodulates 
the  subcarrier,  and  in  thia  manner  the  information  applied  to  eeoh  subcarrier 
Is  recreated  in  the  ground  station. 

The  oenter  frequency  of  aaoh  subcarrisr  mera  or  less  determines  the 
bandwidth  of  the  signal  which  may  be  applied  to  it. ‘  (For  instanoe,  if  a 
uO  ko  subcarrier  is  deviated  +  l5jf,  the  modulation  frequenoy  allowable  for 
an  index  of  5  would  be  1,200  cycles.  Deviations  of  greater  than  ♦  1$%  would 
oreate  side  bands  whioh  would  interfere  with  adjaoent  subcarrier  signals, 
and  to  maintain  low  signal  to  noise  ratios  it  is  advisable  to  keep  modu¬ 
lation  indioee  above  5.  Thus,  in  normal  usage  the  maximum  frequenoy 
applied  to  a  Uo  ko  subcarrier  is  1,200  cycles.)  The  greatest  bandwidth 
available  for  a  oubcarrier  io  do  to  20  ko.  This  may  be  obtained  with  a 
?0  ko  subcarrier  and  using  a  modulation  index  of  1.  This  not  only 
lowers  the  signal  to  noise  ratio  considerably,  but  requires  modification 
of  the  subcarrier  and  the  subcarrier  discriminator. 

This  maximum  of  20  ko  was  not  sufficient  for  the  chaff  tests,  so  it 
waa  decided  to  investigate  direct  modulation  of  a  transmitter. 
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A  phase  modulated  crystal  controlled  transmitter  was  checked  for 
suitability.  In  a  transmitter  tested,  a  crystal  oscillator  ran  at  about 
20  mo,  and  this  frequency  was  multiplied  up  to  the  desired  carrier  frequency 
by  several  multiplying  stages.  One  of  the  multiplier  tuned 1  circuits  was 
tuned  by  a  reactance  tube,  the  reaotance  of  which  was  varied  by  the  moduV 
lating  signal.  This,  in  turn  produced  a  phase  lead  or  lag  in  that  parti¬ 
cular  stage  and  thus  phase-modulated  the  carrier  at  that  point*  The  phase 
modulation  was  then  multiplied  along  with  the  oarrler  until  the  output  at 
the  desired  frequency  was  obtained  with  its  phase  varying  proportionally 
with  the  modulation  signal. 

In  this  type  of  transmitter,  if  a  linearly  rising  signal  is  applied 
as  modulation,  the  phase  will  advance  continually  at  a!  linear  rate.  If 
the,  oarrier  is  observed  during  this  period, 'it  will  be  noted  that  as  long 
as  the  phase  advances,  the  frequency  will  be  increased.  That  is, '''the 
oarrier  will  be  some  steady  value  above  its  normal  unmodulated  frequenoy. 

If  this  signal  is  deteoted  in  an  FM  diseriminator  or  ratio  detector,  we 
will  get  a  constant  voltage  output.  The  modulation  applied,  however,  is 
a  sawtooth,  or  linearly  rising  signal,  so  it  is  apparent  that  a  phase- 
modulated  signal,  when  detected  by  an  FM  receiver,  will  be  differentiated, 

A  further  example  of  this  would  be  to  apply  a  square  wave  to  the 
transmitter.  When  the  input  ohanges  from  its  negative  value  to  its  posi¬ 
tive,  the  phase  of  the  oarrier  is  shifted!  by' some  amount  depending  on 
the  amplitude  of  the  applied  signal.  As  long  as  tha  input  voltage 
remains  constant,  as  during  1/2  oycle  of  the  square  wave,  the  oarrier 
remains  at  its  unmodulated  fraquanoy,  but  advanced  or  retarded  in  phase. 

In  this  case,  a  discriminator  or  ratio  detector  would  see  the  seme  frequency 
at  all  times  exoept  when  the  carrier  was  shifting  from  one  phase  referenoe 
to  another.  During'  these  shifts,  the  discriminator  would  dellvir  pulses 
proportional  to  the  rate  of  ohange  in  phase.  These  pulses  are,  again,  tha 
derlvitive  of  the  applied  modulation  signal.  Figure  3  shows  olsarly  this 
differentiating  action  between  the  input  and  output  of  a  FM  transmitter  with 
square  wave  modulation  applied.  In  normal  FM-FM  usage  this  differentiating 
aotlon  is  of  little  importance  sinoe  the  information  being  oonveyed  is 
strictly  the  individual  subcarrier  frequencies  end  not  their  waveforms. 

For  wideband  purposes,  namely  in  the  desire  to  preserve  wave  forms,  this 
oharaoterlstio  is  quits  e  hinderanos. 

Figure  1*  shows  another  feature  of  FM  transmitters!  namely  the  sloping 
frequenoy  response  curve,  flinoa  the  frequency  generated  by  shifting  tha 
oarrier  is  what  the  receiver  detects,  the  slower  the  phase  is  shifted,  the 
lower  will  be  the  effeotive  deviation.  This,  in  sffsot,  holds  the  modulation 
index  constant,  whioh  results  in  the  oharaoterlstio  that  as  the  modulation 
frequency  decreases,  the  amplitude  of  the  deteoted  signal  decreases.  This 
oreates  a  frequency  versus  amplitude  response  which  is  quite  poor  when 
compared  with  that  of  an  FM  transmitter.  The  frequency  response  below 
one  ko  is  generally  too  low  to  be  useful,  and  applying  larger  input  voltages 
at  the  lower  frequencies  results  in  severe  distortion.  This  diagram  shows 
flattening  of  the  response  curve  above  10  kc  which  is  due  to  an  integrating 
network  across  the  input.  This  effectively  attenuates  the  high  frequencies, 
thereby  reduolng  the  modulation  index  as  the  frequenoy  is  inoreased.  One 
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method  for  increasing  low  frequency  response  is  frequency-modulating  the 
crystal  at  low  frequencies.  The  crystal  can  not  be  "pulled"  veiy  far,  but 
do  response  has  been  obtained  and  a  curve  as  shown  by  the  dotted  lino  was 
achieved.  Frequency  modulated  transmitters  of  the  non-crystal -controlled 
type  were  also  tented,  In  the  past,  TM  equipment  manufacturers  have 
produced  many  FN  transmitters  for  use  in  FM-FM  TK  systems.  With  more  and 
more  tests  being  conducted  simultaneously  and  more  and  more  data  on  the 
air,  crystal  control  to  keep  one  transmitter  from  drifting  into  another's 
channel  has  become  a  must,  and  non-crystal-controlled  transmitters  have 
fallen  into  general  disuse. 

For  wideband  use  the  major  drawback  in  these  transmitters,  aside  from 
lack  of  crystal  control,  was  their  laok  of  rigid  construction.  In  almost 
all  oasee  when  the  transmitters  were  subjected  to  vibration  such  as  to  be 
enoountered  during  the  test,  noise  would  be  generated  in  quantities  equal 
to  or  greater  than  the  signal  being  telemetered.  Several  transmitters  of 
this  type  have  been  strengthened  mechanically  and  vibration  tested,  From 
a  small  group  of  meohanically  sound  transmitters  several  test  records  have 
been  obtained  whioh  have  not  been  bettered  by  any  other  transmitter.  Figure 
£  is  a  response  curve  of  an  FM  transmitter.  The  frequency  response  of  an 
FM  transmitter  is  quite  good.  By  modifying  the  input  cirouitry,  do  response 
can  be  obtained  in  some  models, 

Figure  6  shows  that  phase-shifting  of  the  modulating  signal  is  quite 
low  and  that  good  output  wave  form  fidelity  is  maintained.  The  square 
wave  response  of  an  FM  transmitter  is  shown  in  this  diagram.  This  good 
frequency-response  and  nhase -response  also  applies  to  cxystal  stabilized 
transmitters,  which  are  more  desirable  for  both  mechanical  and  stability 
reasons. 

In  a  crystal  stabilised  FM  transmitter,  an  oscillator  on  the  order  of 
30  me  is  modulated  with  a  reactanee  tube,  and  its  frequency  multiplied 
up  to  VHF  region.  At  the  point  where  the  oroillator  frequency  is 
doubled,  a  portion  of  it  is  mixed  with  a  signal  from  a  crystal  oscillator, 
and  the  difference  frequency  is  fed  to  a  discriminator.  The  output  of  the 
discriminator  is  returned  to  the  modulating  reactance  tube  and  thus  tries 
to  keep  the  difference  between  the  transmitter  oscillator  and  crystal 
oscillator  at  a  fixed  amount.  The  output  frequency  is  thereby  maintained 
fixed  at  almost  crystal  acouraoy.  The  degree  to  which  the  oscillator 
frequency  is  held  constant  depende  upon  the  frequency  response  of  the 
correction  network  from  the  discriminator  to  the  reactanae  tube.  If  a 
very  long  time  constant  is  employed  in  this  feed  back  loop,  it  will  take 
a  relatively  long  time  for  the  discriminator  to  shift  the  oscillator 
back  to  its  center  frequency  after  a  do  step  has  been  applied  to  the 
modulation  terminals.  With  this  tyre  of  system  it  is  evident  that  dc 
response  can  never  be  achieved,  but  response  down  to  a  few  cycles  is 
reodily  attained. 

The  particular  transmitter  tested  war  quite  insensitive  to  vibration. 

Of  all  types  checked,  it  had  the  most  desirable  characteristics  with  a 
minimum  of  extra  work  necessary.  This  type  of  transmitter  was  selected 
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for  use  in  the  final  in  system.  One  drawback  found  later  was  that  in  the 
transmitter  selected,  the  tubes  were  run  over  rating  and  thus  had  a 
considerably  shortened  life#  This  didn't  matter  too  much  during  tests  since 
a  flight  lasts  only  a  few  minutes,  but  during  the  hours  of  TM  calibration 
and  cheoking,  at  least  one  transmitter  has  been  run  beyond  its  useful  life. 

Other  factors  involving  the  selection  of  transmitters  or,  for  that 
matter,  any  item,  is  the  environment  in  which  it  has  to  live.  An  example 
of  what  can  happen  is  pointed  up  by  a  transmitter  build  ty  DOFL  and  flown 
in  a  J>M  rocket.  It  was  not  at  all  ruggedly  constructed  and  when  flown,  was 
surrounded  by  one  lnoh  thiok  foam  rubber.  This  transmitter  produoed  almost 
noise  free  records  from  many  rocket  firings,  yet  when  tested  on  a  vibration 
table  under  conditions  expeoted  to  be  encountered  in  our  teats,  it  was  not 
only  extremely  noisy,  but  rapidly  fell  apart. 

The  vibration  test  given  to  all  transmitters  was  ten  g's  in  three'1 
planes  from  20  oycles  to  500  cycles.  The  output  of  a  reoeiver  tuned  to 
the  transmitter  frequency  was  observed  during  the  shake  test  and,  with  no 
input,  the  output  was  to  remain  at  less  than  $$  of  the  output  observed  with 
a  maximum  allowable  modulation  signal.  If  more  than  %  noise  was  observed, 
the  transmitter  was  rejected. 

After  selecting  the  orystal  stabilized  FM  transmitter,  it  was  neeessary 
to  find  suitable  ground  station  equipment.  The  receiver  was  by  far  the 
easiest  part  to  ohocse  in  setting  up  the  wideband  TM  system.  A  standard 
VHF  telemetering  reoeiver  was  oheoked  for  ifrequenoy  response  and  found 
to  be  good  from  a  few  oyclsii  to  100  ko.  In  the  event  that  do  response  ie 
eventually  obtained  in  a  transmitter,  the  reoeiver  can  easily  be  modified 
to  deliver  do. 

After  transmitting  the  signal  to  the  ground  and  detecting  It,  the 
problem  was  to  reoord  it  so  that  it  oould  be  regenerated  eleotrioally. 

From  photographio  film,  thia  is  quite  lmpraotioal,  if  not  presently 
Impossible. 

Wideband  tape  recorders  are  made  for  standard  FM-FM  systems  and  are 
available  with  bandwidths  from  200  oycles  to  80  or  90  ko.  A  typical  tape 
reoorder  response  curve  is  shown  in  Figure  7.  As  video  recorders,  these 
machines  produoe  a  differentiating  action  similar  to  a  PM  transmitter  sinoe 
when  a  tape  ie  played  baok,  the  voltage  generated  is  proportional  to  the 
rate  of  change  of  flux  on  the  tape.  Figure  8  shows  a  tape  reoorder  phase 
distortion.  There  is  also  a  more  or  less  meohanioal  phase  distortion] 
this  is  produoed  by  the  phenomena  that  as  the  recording  frequenoy  is 
increased,  the  position  of  the  maximum  flux  In  the  reoording  head  gap  will 
move  in  its  relative  position,  loeating  itself  physioally  closer  to  one  of 
the  poles. 

The  net  result  is  that  although  a  tape  recorder  has  fairly  good 
frequenoy  response,  and  can  be  modified  to  go  down  to  $0  cyoles,  it  has 
relatively  poor  phase  fidelity  and  will  distort  wave  forms  with  high 
harmonic  content  rather  severely. 
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There  are  available  carrier  type  recording  systems  which  frequency- 
modulate  a  carrier  and  record  this  carrier  on  the  tape.  This  system 
produces  a  practically  flat  response  from  dc  to  10  kc.  By  altering 
existing  units  and  sacrificing  some  of  the  40  db  signal  to  noise  ratio, 
bandwidths  up  to  20  kc  can  be  obtained. 

A  very  promising  device  in  the  recording  field  is  a  new  video 
recorder  designed  for  television  use.  It  has  a  flat  frequency  response 
from  20  cycles  up  to  4  megacycles.  This  device,  using  a  one  Me  carrier 
system,  for  instance,  could  easily  record  from  do  to  100  kc.  At  the  time 
of  our  testing,  the  video  recorder  was  not  available,  and  the  previously 
mentioned  carrier  system  did  not  have  enough  bandwidth,  so  a  standard 
FM-FM  system  recorder  was  ussd.  The  signals  received  turned  out  to  be  low 
in  harmonic  content,  so  it  was  felt  that  little  distortion  was  present. 

Also,  the  lack  of  dc  response  was  part&Oly  compensated  for  by  recording  the 
signal  at  the  receiver  directly  on  film.  This  gave  a  visual  record  of  the 
signal  down  to  the  low  frequency  limit  of  the  transmitter.  Film  records 
are  also  made  for  visual  inspection  of  signal  wave  forms.  During  the  test 
the  signal  from  the  receiver  is  photographed  on  both  high  speed. and  slow 
speed  films. 

There  are  problems  encountered  in  attempting  this  direct  recording. 
First  of  all,  if  fair  resolution  of  the  signal  is  desired,  the  film  speed 
must  be  a  minimum  of  400  inches  per  second.  For  good  resolution  the  speed 
must  be  greater,  approaching  100  feet  per  second.  In  this  event,  to  cover 
a  ten-second  test,  a  1000  ft.  camera  capacity  would  be  required,  and  avail¬ 
able  Fa stax  or  Eastman  high  speed  cameras  hold  only  a  hundred  feet  of  film. 
Perhaps  with  extremely  accurate  timing,  the  precise  second  of  encounter 
could  be  recorded,  but  no  reliable  method  for  starting  the  camera  at  the 
right  moment  is  available. 

A  Miller  high  speed  oscillographic  recorder  is  available  which  records 
on  photographic  paper.  This  device  has  a  paper  speed  of  400  Inches  per 
second  and  a  capacity  of  12  seconds  running  time.  Although  the  speed  is 
lower  than  that  desirable  for  good  visual  records,  usable  records  are 
practically  guaranteed  with  a  minimum  of  timing  problems.  This  machine  is 
presently  being  modified  to  run  at  800  inches  per  second,  and  the  film 
magazine  capacity  is  being  doubled. 

Records  are  also  made  on  35  mm.  film  running  at  60  inches  per  second. 
These  films  show  overall  characteristics  and  existanee  of  signals,  but  are 
not  of  too  much  value  for  anything  else  since  wave  forms  cannot  be  distin¬ 
guished,  frequencies  cannot  be  measured,  and  they  can  not  be  '♦played  back” 
electrically. 

Figure  9  shows  the  entire  wide  band  telemetering  system.  This  consists 
of  a  crystal  stabilized  FM  transmitter  which  is  modulated  with  the  signal 
from  the  fuze  receiver.  The  particular  transmitter  chosen  was  selected 
for  its  good  frequency  response  and  freedom  from  microphonics.  The 
receiver  is  a  standard  FM  receiver.  The  signal  is  recorded  on  tape  for 
playback  analyse?).  It  is  also  recorded  on  high  speed  film  for  visual 
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waveform  analyses  and  on  slow  speed  film  for  visual  inspection  o.f  general 
overall  envelope  structure.  ■  •" ' 
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Oood  results  have  been  obtained  from  several  flight  tests  in  which 
characteristics  of  chaff  eohos  have  been  easily  discernible  from  aircraft 
eohos.  Unfortunately,  since  these  differences  in  characteristics  apply  to 
specific  fuses,  they  cannot  be  discussed  in  this -> paper. 
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In  choosing  a  topic  for  discussion,  I  sought  a  subject  which  X  felt 
would  be  of  timely  interest,  relatively  dear  out,  and  generally  without 
a  great  deal  of  controversy  about  it.  From  a  layman's  point  of  view,  the 
topic  of  automation  seemed  to  fill  the  bill*  X  was  extremely  naive  in  my 
choice*  Fortunately  1  chose  to  deliver  this  presentation  in  a  clinioal 
session  of  this  conference*  X  do  so  utilizing  that  definition  of  clinioal 
session  whloh  allows  the  presentation  of  a  problem  area  with  no  answers  or 
solutions  required  from  the  speaker. 

%  interest  in  automation  is  in  the  man-machine  integration  involved 
in  a  complex  system.  Contrary  to  the  layman' s  popular  conception  of  auto¬ 
mation,  which  is  essentially  the  pushing  of  a  start  and  stop  button  in 
response  to  a  red  or  green  light,  there  may  be  a  more  intricate  relation¬ 
ship  involved*  With  this  thought  in  mind,  X  diligently  began  what  X  in¬ 
tended  to  be  an  intensive  literature  review* 

Much  of  the  published  literature  on  automation  is  concerned  with  seman¬ 
tic  arguments  of  definition,  economics,  and  the  pros  and  cons  of  the  effeots 
of  either  a  benefactor  or  monster  on  society,  depending  upon  whether  manage¬ 
ment  or  labor  was  speaking*  Only  one  point  of  commonality  appeared  to 
exist.  The  major  portion  of  the  material  in  the  literature  began  with  a 
definition.  The  definitions  were  varied  and  not  always  in  total  agreement. 
Ac  examples,  some  of  the  definitions  were* 

X.  Automation  means  automatio  oontrol.  (l) 

2.  The  substitution  of  mechanioal,  pneumatic,  hydraulic,  eleotrical 
and  electronic  deviceE  for  human  organs  of  decision  and  effort* 

3*  The  soience  and  art  of  manufacturing  products  with  minimum  labor, 
effort  and  cost,  and  maximum  efficiency* 

it*  The  elimination  of  repetitive,  onerous,  dangerous  and  trivial  labor, 
mental  or  physical,  from  the  realm  of  human  endeavor. 
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S*  The  way  to  a  society  in  which  labor  is  nocosfary  not  for  the 
physical  needs  of  the  body,  but  for  the  oreative  needs  of  the  soul. 

In  a  final  definition  for  purposes  of  this  discussion,  a  distinction  is  made 
between  mechanization,  which  replaces  or  amplifies  human  brawn,  and  automa¬ 
tion,  which  supplements  the  human  brain  through  the  inclusion  of  feedbacks 
or  self-correcting  devices.  (3). 

Following  the  established  pattern  of  defining  the  term,  an  operational 

■»*■  After  1 $  February  39£7  the  author  will  be  associated  with  the  International 
Business  Machine  Corporation  (Endicott,  New  York). 


-  .  •/. «, 


-\V 

b-V',W 

1  \V«  ' 


v  v.w/ -v-f.  ■/.  *’•/  •/  > 

0  10  -  i0  ■  -  '#  \Q_  .0 


v-Y-Y-'  v  ■»'  O'  ■:  ■  •:Vb  \T 'V'V 

. : . 

’■TTCT \T?OT.y.v  W. 


200 


Design  of  Experiments 


definition  for  purposes  of  discussion  is  proposed: 

Automation  -  the  substitution  of  a  mechanical  and/or  electronic 

device  in  a  man-machine  system  for  a  function  previously 
requiring  human  perceptual,  cognitive,  memory,  decision 
making  capabilities  or  psychomotor  response* 

This  paper's  Interest  in  automation  is  from  a  man-maholne  integration 
standpoint*  The  type  of  automated  systems  of  prime  interest  are  weapon 
systems*  These  fall  into  the  category  of  fire  control  and  guidance  systems, 
primarily  for  guided  missiles .  Automation,  using  the  operational  definition, 
is  an  Inherent  component  of  all  of  these  systems*  The  question  to  be  raised 
at  this  time  is  how  far  should  these  systems  be  automatised  considering,  the 
reliability  of  the  output  of  the  entire  system*  Sinee  it  is  not  probable 
that  we  Will  have  systems  entirely  independent  of  human  influences  in  the 
immediate  future,  either  from  ah  operational  or  a  maintenance  standpoint, 
we  will  still  be  dealing  with  man-machine  systems*  There  are  several 
factors  which  must  be  considered  in  the  establishment  of  criteria  for  a 
point  of  diminishing  returns  in  an  automated  system*  These  orlterla  are 
faotors  whioh  strongly  influence  the  reliability  of  the  total  system.  Per¬ 
haps  at  this  point  the  term  reliability  as  used  here  should  be  defined*. 

The  term  "reliability"  should  mean  essentially  the  probability  of  a  men- 
maohine  system,  in  this  case  a  weapon  system,  aocompllehing  its  military 
mission* 

i 

Faotors  inherent  in  the  system  which  will  Influence  the  reliability  of 
the  system  aret 

1*  Complexity  of  the  mechanical,  electronic,  hydraulic  and  communi¬ 
cative  components  of  the  syetem. 

2*  Reliability  of  the  parts  making  up  these  components. 

3*  Environmental  faotors  such  as  temperature  extremes,  vibration, 
shock  and  acoelleratlon  whioh  will  influence  the  reliability  of  both  parts 
and  components. 

U«  The  quality  and  quantity  of  manpower  required  to  operate  and  service 
the  system* 

Environmental  and  mental  streets o  placed  on  the  manpower  serving 
the  system* 

Further  consideration  must  also  be  made  as  to  the  intended  use,  from  a  tae- 
tloal  point  of  view,  of  any  particular  weapon  system.  Requirements  exist 
whioh  limit  the  size  and  weight  of  weapon  systems*  Consideration  as  to 
production  cost,  maintainability,  and  transportation  of  such  systems  must 
also  be  made*  How  then  may  criteria  be  established  which  will  provide  the 
planners  and  designers  of  automated  equipment  with  sufficient  information 
eo  as  to  enable  them  to  develop  systems  which  will  meet  both  technical  and 
tactical  specif! cations.  A  fairly  obvious  answer  presents  itself  immediately* 
Merely  determine  the  capability  and  reliability  of  functioning  of  the  par¬ 
ticular  machines  involved  and  the  capability  and  reliability  of  the  men  who 
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The  answer  sounds  fairly  simple.  The  implementation,  however,  leads 
us  into  a  variety  of  problems#  One  of  the  aims  in  the  design  of  electronlo 
machines  is  the  development  of  high  performance  equipment  using  automatic 
control,  guidance  and  computing  features.  The  incorporation  of  these 
features  generally  results  in  more  complex  and  sometimes  less  reliable 
equipment*  A  serious  question  previously  raised  by  Boodman  (2)  is  the 
problem  of  deciding  what  degree  of  reliability  in  a  given  operation  is 
acceptable  and  of  determining  the  degree  of  complexity  in  a  machine  that 
will  decrease  the  reliability  beyond  thle  acceptable  value* 


To  consider  the  dependence  of  equipment  reliability  upon  equipment 
complexity,  the  faotors  previously  mentioned  as  af footing  component  relia¬ 
bility  must  be  understood  and  a  measure  of  equipment  complexity  must  be 
‘established.  By  the  same  token,  the  complexity  and  reliability  of  the 
human  component  of  a  man-machine  system  must  also  be  considered.  Factors 
affecting  the  reliability  of  a  human  operator  in  a  broad  sense,  are  somewhat 
similar  to  those  factors  which  affect  the  reliability  of  machines.  Environ¬ 
mental  extremes,  visual  limitations,  auditory  limitations,  noise  and 
vibration  are  some  of  the  external  forces  influencing  the  human  boing* 
Unfortunately,  the  human  machine  oannot  be  subjected  to  standardization  of 
parts  and  quality  controls  in  production.  Therefore,  we  must  consider 
individual  differences  as  factors  of  fatigue,  motivation,  peroeptual 
stress,  cognitive  ability  and  psychomotor  limitations  in  determing  the 
reliability  of  the  human  being.  The  problem  of  establishing  reliability 
criteria  on  either  machines  or  man  is  in  itself  a'mofct  difficult  task.  The 
problem  beoomes  even  more  difficult  when  the  relationship  between  the 
oonplexity  and  reliability  of  machines  and  the  functioning  of  human 
components  within  the  system  must  be  combined  to  arrive  at  an  output  figure. 


No  definitive  or  Inclusive  approach  is  known  to  the  writer  at  the 
present  time.  It  is  intended  that  this  problem  will  stimulate  the  thinking 
of  scientists  concerned  with  the  design  of  complex  machines  and  the  people 
who  must  operate  and  maintain  them  in  order  that  hypotheses  be  formulated 
and  tested  which  will  lead  towards  even  partial  anewerr  to  the  problems. 


One  approaoh  which  may  be  considered  is  that  of  attempting  to  estab¬ 
lish  a  quantitative  and  qualitative  measure  of  complexity  for  maohinee. 

In  these  measures  of  complexity  must  be  incorporated  varying  degrees  of 
neoe scary  human  input.  Perhaps  by  studying  the  interaction  of  man  and 
machine  in  a  variety  of  situations  which  range  from  simple  to  oomplex  in 
terms  of  both  human  and  equipment  functioning,  criteria  may  be  developed 
which  will  indicate  fixed  points  delineating  optimum  functions  in  inte¬ 
grating  man  and  machine. 
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lKTVhfihhG  Mttl&R  A  Btf.El'CJT  I-iiUiTf  TASTE  iPECIFICATlCNS 


Norman  J .  Gutman 
QM  Food  and  Container  Institute 
Chicago,  Illinois 


'Hie  quality  of  many  food  products  cannot  be  determined  exclusively  by 
objective  physical  and  chemical  tests.  Thus,  it  is  necessary  to  have 
some  measure  of  taste  or  palatability  by  a  consumer  or  expert  panel. 
Therefore,  the  specifications  written  for  various  foods  require  that 
certain  taste  or  palatability  criteria  be  met.  Our  problem  is  that  of 
setting.,  these  criteria  on  a  basis  which  protects  the  legitimate  interest 
of  both  the  government  and  tho  producer. 

In  practice  the  problem  arises  in  two  separate  stages.  First,  a 
standard  for  a  satisfactory  product  is  to  be  established.  Second,  as 
individual  contracts  are  let,  it  is  necessary,  by  pre-award  testing,  to 
determine  whether  the  product  submitted  for  evaluation  meets  the  established 
standard. 

In  the  first  stage,  establishing  a  standard,  the  present  praotice 
is  to  have  a  group  of  persons,  either  military  or  civilians,  at  an  Army 
post  or  at  tlio  QM  Food  and  Container  Institute,  all  depending  on  the 
particular  product,  rate  certain  samples  on  nine  point  scales.  The 
standard  most  commonly  used  is  a  preference  scale  called  the  hedonic  scale; 
a  quality  grading  scale  is  somewhat  less  frequently  used.  This  talk  will 
not  consider  the  questions  of  adequacy  of  scale,  dimensions  of  preference, 
offect  of  one  sample  upon  the  rating  of  another,  and  ether  such  problems 
disoussod  by  Professor  Bradley.  Whether  these  limitations  will  to  a 
serious  oversimplification  is  a  point  which  might  well  be  considered. 

Most  specifications,  as  presently  written,  require  that  any  sarqple 
whose  mean  scale  rating  is  significantly  below  the  mean  rating  of  all 
samples  at  the  level  be  rejected  from  further  consideration  in  estab¬ 
lishing  the  standard.  Depending  on  the  specification,  the  test  may  range 
from  the  erroneous  application  of  a  multiple  Student  t  test  UBing  the  gross 
variance  of  the  rating  of  a  sample  to  a  multiple  range  or  multiple  F  test 
such  as  those  developed  by  Duncan,  Tukey,  Dunnett,  or  Dechhofer.  However, 
none  of  these  tests  is  directly  valid  since  ratings  for  any  ono  sample 
must  be  comparied  with  the  average  rating  of  all  samples. 

A  question  which  immediately  arises  is  why  a  sample' s  rating  Bhould 
be  compared  with  the  average  rating  of  all  samples.  The  justification 
essentially  is  that  the  samples  submitted  for  testing  are  representative 
of  the  quality  of  product  available,  and  only  those  samples  which  rate 
sufficiently  below  the  average  should  be  rejected.  It  may  be  added  that 
markedly  inferior  samples  are  usually  screened  out  by  chemical  and  physical 
tests  before  the  taste  test  is  run. 

The  second  stage  arises  after  standards  for  the  product  have  been 
established.  Now  the  problem  Is  to  see  that  the  samples  of  product  sub¬ 
mitted  for  pre-award  evaluation  meet  these  standards.  But  in  this  case, 
is  a  product  to  Ye  compared  with  its  previous  quality  or  with  the  standard 
of  satisfactory  products  established  at  the  previous  evaluation?  The 
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latter  comparison  has  been  preferred  since  in  the  first,  if  a  product  is 
of  very  high  quality  in  the  first  evaluation  but  is  of  lower  quality  on 
the  second,  it  is  rejected  while  another  product  of  the  same  quality  in 
the  second  evaluation  but  of  lover  quality  on  the  first  will  be  accepted* 

Here  the  problem  of  comparison  withlhe  standard  obtained  in  the  first 
evaluation  arises*  One  might  compare  the  average  level  of  ratings  for  satis¬ 
factory  products  in. the  first  evaluation  with  the  average  rating  for  a 
product  in  the  second  (or  pre-award)  evaluation*  Since  it  ia  known  that  the 
level  of,,  preference  ratings  may  vary  considerably  with  time  and  with  the  > 
group  rating,  this  comparison  is  somewhat  untrustworthy*  Thus,  if  the  gen¬ 
eral  level  of  ratings  on  the  firet  test  is  high,  while  on  the  second  test 
it  is  low,  pre-award  samples  may  be  rejected  even  though  they  are  of  as 
good  a  quality  as  those  on  the  first  evaluation.  Conversely,  a  low  rating  , 
group  on  the  first  evaluation  and  high  rating  group  on  the.  seoond  evalu¬ 
ation  may  result  in  a  poor  quality  product’s  feeing  purchased* 

In  the  past  it  was  necessary  to  follow  this  practice  in  all  prodiicis, 
and  it  is  still  followed  in  some  produots.  In  a  few  produets  whose  quality 
is  not  affected  toe  seriously  by  a  reasonable  length  of  storage  (say  one  year 
or  less)  satisfactory  samples  of  the  produots  from  the  first ( evaluation 
are  stored*  Then,  when  a  pre-award  evaluation  is  ntcessary,  samples  from 
the  previously  satisfactory  productions  are  tested  along  with  the  pre-award 
sample sj  thus  a  more  legitimate’ comparison  can  be  made*  In  other  produots  ” 
where  manufacturing  practices  permit,  a  different  method  is  used*  Through 
procurement  or  its  own  production,  the  Institute  obtains  satisfactory 
samples  of  a  product  to  be  established  as  a  standard.  Then  these  standard 
samples  are  submitted  .by  invitation  tb  a  group  of  producer?  who  are  asked' 

to submit  pre-award  samples  at  least  as  good  as  the  standard  sample.  Then' 

on  pre-award  evaluation,  the  pre-award  samples  are  tested  together  with. the 
standard.  However,  some  produots  are  relatively  perishable,  and  so  those 
procedures  oannot  be  followed,  and  the  pre-award  samples  must  be  compared 
with  the  average  level  of  ratings  of  satisfactory  products  in  the  first 
evaluation* 

In  the  first  two  situations where  a  direct  comparison  among  the  pre¬ 
award  samples  and  the  standards  can  be  made,  a  test  such  as  that  of  Dunnett 

( JASA,  Deo  195#)  is  readily  applicable.  In  the  remaining  situation,  where 
the  pre-award  samples  are  compared  with  the  level  of  ratings  set  as  the 
standard,  the  specificatione  as  presently  written  require  that  a  multiple 
t  test  be  used*  It  appears  that  the  Duncan,  Tukey,  Dunnett,  and  Bechhiofer 
multiple  comparison  testa  are  not  directly  applicable  to  this  problem* 

These  are  problems  which  vitally  affect  the  Armed  Forces,  and  any 
assistance  in  their  solution  will  be  deeply  appreciated* 


EXPERIMENTAL  DESIGN  FOE  DETERMINING  SPECIFICATION 
LIMITS  FOR  MANGANESE-ALUMINUM.  BRONZE 


S.  L,  Eialer 
Rook  Island  Arsenal 

Federal  Specification  QQ-C-523  covers  the  procurement  of  manganese 
and  manganese-aluminum  bronze  ingots  for  remelting.  There  are  several 
alloys  with  various  ohemical  composition  limits  specified.  In  addition* 
there  are  mechanioal  property  requirements  such  as  tensile  strength* 
yield  strength  and  elongation. 

The  problem  which  has  been  encountered  on  numerous  occasions  is  that 
suppliers  are  able  to  easily  meet  the  ohemical  requirements,  but  not  the 
physical  requirements.  This  naturally  leads  to  a  great  deal  of  discussion 
as  many  suppliers  feel  that  if  the  material  passes  the  chemical  analysis 
it  will  possess  the  mechanical  properties  required.  Unfortunately,  this 
is  not  true  and  it  is  the  opinion  of  the  metallurgists  at  Rook  Island 
Arsenal  that  the  limits  for  chemical  composition  are  too  broad.  It  is 
also  their  opinion  that  conditions  of  preparation  of  the  ingots,  although 
contributory  to  their  final  properties,  are  of  minor  significance.  There¬ 
fore,  we  are  interested  in  studying  the  changes  in  physical  properties  as 
the  percentage  of  eaoh  alloying  element  is  varied  within  the  specification 
limits* 

For  example,  let  us  consider  the  requirements  for  Alloys  B  &  0  which 
has  the  same  chemical  composition  limits  but  different  mechanical  property 
requirements  s 


Chemical  Composition 


Copper 

60  -  68$ 

Aluminum 

3.0  -  7.5$ 

Manganese 

2.5  -  5.0 $ 

Iron 

2.0  -  k.O $ 

Tin 

<0.10$ 

Lead 

<0.10$ 

Niokel 

<1.0$ 

Zinc 

Remainder 

Mechanical  Properties 

Q 

Tensile  Strength  (min,  psi) 

967600 

TTiJSSo 

Yield  Strength  (min.  psi) 

U5,ooo 

60,000 

Elongation  (min.) 

18$ 

12$ 
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This  particular  example  presents  a  more  complicated  problem  due  to 
the  dual  set  of  mechanical  property  requirements*  However)  it  has  been 
chosen  as  an  example  as  it  is  believed  that  separate  chemical  compositions 
should  possibly  be  specified  for  each  alloy.  Although  this  example  is 
more  complicated  than  the  other  alloys  specified,  the  same  difficulties 
have  been  encountered. 

The1  problem  which  we  would  like  to  present  to  this  clinical  session 
today  is  how  can  we  design  an  experiment  to  determine  reduoed  limits  for 
the  more  important  elements)  suoh  as  copper  and  zinO)  whioh  will  insure 
oonformanoe  with  the  mechanical  requirements.  It  will  be  noted  that  as 
the  copper  content  is  inorsased  ths  sine  oontent  is  similarly  reduced, 
providing  that  the  oontsnts  of  the  other  elements  are  unchanged.  This 
presents  a  difficult  situation  as  you  oan  not  change  the  content  of  one 
element  independently  of  the  other. 

It  has  been  suggested  that  an  extensive  review  of  past  data  and 
comparison  of  the  composition  and  mechanical  properties  of  past  lots 
might  prove  valuable.  However)  after  ohecking  over  some  of  the  past 
data  it  was  found  that  insufficient  information  was  available. 

Therefore,  we  are  open  for  ideas  which  will  simplify  this  investi¬ 
gation.  Perhaps  someone  present  has  encountered  a  similar  metallurgical 
problem. 

I  might  add  that  this  problem  is  not  common  to  this  specification 
alone.  It  is  also  quite  common  to  Federal  Speoification  QQ-B-675  whioh 
covers  Aluminum-Bronze  Ingots. 


GAMFLING  FLAN  FOR  PACKAGING  MATERIALS 
PRODUCED  BY  A  CONTINUOUS  PROCESS 


S.  L.  Eislsr 
Rook  Island  Arsenal 

The  Department  of  the  Army  purchases  a  large  number  of  packaging 
materials  which  are  products  of  a  continuous  manufacturing  process*  This 
is  true  of  various  paper  products*  barrier  materials*  textiles*  tapes*  eto. 
During  the  manufacturing  prooess*  the  continuously  produced  product  is 
rolled  into  convenient  sized  rolls.  Unfortunately*  in  most  oases  the 
identification  of  rolls  in  order  of  production  within  a  lot  is  not  avail¬ 
able. 


Thus*  an  inspector  may  be  faced  with  the  problem  of  selecting  a 
representative  eample  from  a  shipment  of  100  or  more  rolls  for  laboratory 
tests.  There  are  numerous  sampling  methods  presented  in  the  literature 
for...  sampling  carloads  of  ooal  or  salt*  tank  oars  of  oil  or  acid*  and*  of 
oourse*  the  numerous  methods  of  selecting  a  sample  of  a  dlsorete  manufac¬ 
tured  unit.  However*  the  problem  mentioned  above  is  unlike  any  of  these 
situations  due  to  the  fact  that  samples  from. the  interior  of  the  rolls 
are  not  readily  aooeaaible. 

Therefore*  it  is  believed  that  the  first  step  must  be  a  study  to 
determine  the  magnitude  of  the  various  souroes  of  variability.  The  three 
major  sources  of  variability  are  probablyt 

1.  Edge  to  edge  variation. 

2.  Within  roll  variation. 

3.  Between  roll  variation. 

From  the  results  of  this  preliminary  investigation  conducted  on  products 
from  a  representative  oross-section  of  suppliers,  it  ahould  be  possible 
to  test  the  significance  of  the  variabilities  of  the  above  three  sources 
against  the  variabilities  of  the  different  tests  employed. 

Based  on  the  above  comparisons,  definite  sampling  plan  recommendations 
could  be  made  which  would  result  in  samples  which  would  reflect  the  varia¬ 
tions  considered  significant.  For  example*  if  the  edg®  to  edge  variation 
were  the  only  one  found  to  be  significant,  one  sample  taken  from  any  roll 
would  be  sufficient,  provided  the  individual  test  specimens  were  randomly 
oho sen  from  the  sample. 

Many  of  the  current  military  specifications  for  materials  of  this  type 
state  the  sample  size,  e.g.,  in  square  yards*  and  even  specify  the  number 
of  equare  yards  to  be  taken  from  a  roll.  However*  it  is  believed  that  these 
choices  have  been  made  without  a  realistio  statistical  evaluation  of  the 
material,  such  as  is  proposed. 

There  are  now  two  or  three  questions  I  should  like  to  present  to  the 
panel  and  the  others  in  attendance. 
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1,  Does  our  approach  to  the  problem  appear  to  be  reasonable? 

2*  Does  anyone  know  of  any  similar  materials  which  have  been  studied? 
If  so |  what  type  of  sampling  plan  resulted  from  these  studies? 

3*  How  may  the  various  procedures  for  sampling  inspection  by  variables 
(0RD-M608-10)  where  definite  units  of  product  are  specified  be  bonverted  to 
apply  to  material  produced  by  a  continuous  process?  What  constitutes  a 
unit  of  product  for  material  of  this  type? 

EXAMPLE 


For  example,  0RD-M608-10  specifies  for  a  lot  size  of  10,000  (assuming 
we  have  10,000  sq.  yds.  of  material  in  the  lot  and  have  designated  1  sq. 
yd.  as  the  unit  of  product)  a  sample  size  of  seventy.  MIL-B-121A,  whioh 
covers  barrier  material,  specifies  iiO  sq.  yds.  for  a  similar  size  lot. 

The  total  amount  of  material  required  for  the  laboratory  tests  is  approxi¬ 
mately.  6  sq.  yds. 

Now,  the  question  arises  as  to  how  the  test  specimens  are  to  be  distrib- 
Utad  throughout  the  sample.  There  is  also  no  way  in  which  the  acceptance 
oriteria  of  a  variables  sampling  plan  may  be  used  where  a  measurement  is 
not  made  on  each  unit  of  product  but  where  a  number  of  measurements  are 
taken  on  the  entire  sample  made  up  of  several  units  of  produot. 


OBSERVATION  ON  THE  USE  OF  MODULE  IN  THE  DESIGN  OF  EXPERIMENT 


James  W.  Mitchell 
Frankford  Arsenal 

The  importance  of  models  in  statistics  is  almost  obvious.  Mathemati¬ 
cal  models  are  widely  used  to  express  statistical  tests  and  as  a  basis  for 
deriving  new  ones.  However,  exact  mathematical  models  in  equation  form  are 
usually  no  easier  to  understand  by  scientists  and  engineers  in  other  fields 
than  the  rest  of  the  language  of  statistics.  In  communication  between 
statistician  and  other  scientists  and  administrators,  models  oan  play  an 
important  roll  in  clarifying  understanding  of  a  problem  in  statistics. 

One  would  begin  by  statement  of  the  hypotheses  in  terms  of  models  - 
but  not  necessarily  mathematical  forms.  Thus  the  problem  is  defined  in  a 
form  underetood  by  the  statistician  as  the  bases  of  a  well  defined  statis¬ 
tical  test  and  by  the  engineer  as  a  form  which  his  collected  data  may  take. 

It  should  therefore  be  of  tremendous  help  in  refining  the  statement  of  the 
problem  to  the  mutual  satisfaction  of  both  statistician  and  scientist  and 
thus  form  a  common  meeting  ground  for  the  two.  It  is  my  thesis  to  try  to 
exploit  this  property  of  models  to  greater  advantage  to  improve  the  communi¬ 
cation  between  scientist,  engineer  and  statistician. 

A  discussion  of  scientific  models  oan  lead  one  far  into  the  field  of 
philosophy  and  logic.  This  would  be  unwise  to  attempt.  However,  it  is 
well  to  recognize  three  levels  of  model  making.  First,  a  oomplete  soientifio 
model  of  an  experiment  would  encompass  all  the  possible  concepts  and  relations 
which  a  scientist  could  use  and  thus  it  is  an  ideal  of  science.  It  could 
involve  the  whole  wealth  of  modern  logic  and  mathematice,  the  fields  of 
science  needed  to  describe  the  possible  phenomena  and  the  definition  involved 
thue  requires  the  aid  of  psychologist  and  sociologist  as  well.  .It  is  quite 
a  formal  structure  and  probably  never  has  been  fully  realized  in  any  field. 
Today  many  partial  models  are  being  constructed  to  suit  the  various  sciences. 
The  expression  of  physical  laws  or  the  statistical  concepts  which  we  have 
been  hearing  about  in  terms  of  mathematical  equations  represents  these 
partial  models.  These  are  the  working  models  used  by  the  soientiet  in  his 
field  to  advance  his  study  of  the  science.  However  there  is  still  another 
level  of  models  needed  today.  These  are  models  required  to  create  common 
understandings  between  dependent  but  different  fields  of  science  on  the 
level  of  the  common  worker. 

Let's  examine  an  example  of  the  formation  of  a  problem  model.  One 
observes  a  difference  in  some  measured  property  between  two  or  more  groups 
of  items  and  forms  an  explanation  of  the  difference.  This  explanation  ia 
then  contrasted  with  the  universally  applicable  hypothesis  of  randomness. 
Statistics  are  then  applied  by  creating  a  specific  statistical  (null) 
hypothesis  or  model  out  of  the  vague  concepts  of  random  phenomena.  Some¬ 
times  the  choice  of  a  statistical  or  random  model  is  obvious;  sometimes 
it  is  far  from  easy  to  find  an  acceptable  model  to  match  the  natural 
situation.  A  model  may  also  be  devised  for  the  alternative  hypothesis 
corresponding  to  the  physical  explanation  of  the  difference.  Although  it 
is  often  not  needed,  the  latter  would  be  essentially  one  of  difference, 
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correlation  or  non-randomness.  The  statistical  test  is  then  applied  by- 
comparing  the  experimental  data,  collected  under  the  assumption  of  random 
sampling,  with  the  statistical  model.  A  choice  between  the  null  and  alter¬ 
nate  hypothesis  is  then  made  according  to  whether  or  not  the  composition 
of  the  data  can  be  explained  by  this  statistical  model.  The  model  must  be 
speolfie  in  the  sense  that  one  can  calculate  from  it  the  probability  of 
occurrence  of  deviations  from  the  assumed  average  composition  of  the  model. 
The  magnitude  of  the  deviation  of  the  experimental  data  from  the  assumed 
statistical  model  then  forms  the  basis  of  choice  between  the  null  and  alter¬ 
nate  hypothesis,  l1.  between  the  statistical  and  physioal  models  of  the 
experiment. 

Statistical  procedures  which  fit  the  above  example  are  the  comparison 
of  two  or  a  set  of  averages  or  variances  and  related  tests.  The  random  model 
for  these  is  the  normal  distribution.  This  model  is  easily  understood  and  1 
oan  be  concretely  illustrated  in  a' variety  of  ways  (e.g.  the  Quincunx). 

Other  olosely  related  models  are  the  binomial  and  Poisson  distributions. 

The  familiar  urn  containing  balls  of  tWo  colors  is  a  physical  model  of  these 
distributions. 

In  order  to  form  a  logioal  basis  of  the  statistical  test  the  model 
should  have  oertain  properties.  These  aret  first  the  property  of  being 
specif ic  in  the  sense  that  it  permits  the  adoption  of  specif io  statistical 
procedures.  The  normal  distribution  is  a  good  example  of  this.  Models 
must  also  satisfy  oertain  requirements  of  randomness  and  may  contain  arbi¬ 
trary  elements  that  are  not  '’natural"  but  which  do  not  conflict  with  the 
possible  alternate  hypotheses. 

It  is  certainly  not  necessary  to  oonstiuot  a  model  about  the  null  or 
statistical  hypothesis.  Physioal  concepts  which  ckn  be  expressed  in  mathe¬ 
matical  form  are  best  represented  by  this  "mathematical  model".  The 
physioal  oonoept  usually  implies  causality.  The  simplest  form  of  a  mathe¬ 
matical  modal  would  probably  be  a  linear  regression  in  two  variables.  In 
a  more  general  example  there  are  multivariate  regression,  power  functions 
and  any  number  of  possible  mathematical  forms  representing  specific  types 
of  causality  and  even  natural  law.  In  each  case  the  a'  priorle  assumption 
of  one  of  these  relationships  constitutes  a  mathematical  model  of  the 
portion  of  the  physioal  universe  to  be  examined.  One  then  wishes  to  see 
if  the  experimental  data  are  consistent  with  or  will  support  this  hypothesis. 
The  procedure  of  statistical  testing  requires  the  ereatlon  of  an  alternate 
statistical  or  random  model  in  which  the  display  of  experimental  observa¬ 
tions  are  attributed  to  chance  alone.  These  statistical  models  are  usually 
more  complicated  than  the  simple  normal,  or  oihor  distributions  referred  to 
previously.  In  faot  the  statistical  model  can  be  considered  as  N  dimensional 
for  an  N  dimensional  physical  law.  However  to  be  able  to  treat  the  results 
quantitatively  with  the  usual  tests  of  significance,  some  specific  distri¬ 
bution  function  must  be  assumed  and  applied  one  dimension  at  a  time,  i.e., 
coefficient  by  coefficient.  The  mathematical  and  the  statistical  model  may 
then  be  used  together  to  illustrate  the  application  of  statistics  to  the 
problem. 
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Another  class  of  models  are  those  on  which  the  factorial  experiment  and 
randomized  block  designs  are  based.  The  statistical  model  is  a  randomized 
area,  or  N-dimensional  volume  as  for  the  causal  relationship  above  and  the 
physical  model  is  a  form  of  the  multivariate  equation  but  which  includes 
terms  for  experimental  error  and  other  variation  as  well  as  the  main  variable 
terms  and  their  interactions. 

I  hope  that  these  examples  are  sufficient  to  illustrate  some  forms  that 
a  model  may  assume,  These  may  be  mathematical,  physical  such  as  the  urn  and 
balls  or  a  roulette  wheel,  spatial  as  an  N-dimensional  model  or  even  mechani- 
oal  such  as  a  model  to  show  the  interaction  of  tolerances.  The  model  is  a 
type  of  model  which  is  comprehended  by  the  engineer  in  some  familiar  dimen¬ 
sional  or  spatial  form  and  by  the  statistician  as  a  specific  model  of  a  random 
distribution  of  objects  or  events  is  a  particularly  useful  form  for  improving 
the  experimental  design.  If  these  two  start  by  reduoing  the  problem  to  a 
statistical  model  of  the  null  hypothesis,  the  similiarity  of  this  model  to 
the  preconceived  physical  or  '’natural*  model  of  the  experiment  will  be  easier 
to  see.  The  statistical  and  physloal  models  can  then  be  refined  until  the 
experimenter  is  satisfied.  The  physical  model,  thus  defined,  becomes  an 
alternate  hypothesis  and  this  interplay  may  even  lead  to  other  alternate 
hypotheses  that  deserve  consideration.  Often  it  may  happen  that  a  scientist 
is  led  to  accept  one  statistical  procedure  as  best  suited  to  his  need  when 
it  is  not  entirely  appropriate  to  his  experiment.  The  praotioe  of  first 
settling  on  the  correct  model  with  several  possible  statistical  tests  in 
mind  should  prevent  this  and  would  permit  full  utilization  of  the  model  as 
a  joist ing  ground  between  the  experimenter  and  the  statistical  until  an 
acceptable  test  is  found. 


SHORT  RANGE  SCATTER  PROPAGATION  SURVEY 


Messrs «  Lacy,  Sharp  and  Lindner 
Signal  Corps  Engineering  Laboratories 

INTRODUCTION :  The  technique  of  photographing  the  returned  terrain 
scattered  power,  observed  on  a  radar  scope,  and  then  overlaying  the  photo¬ 
graphed  terrain  scattering  areas  on  a  properly  oriented  oontour  nap  of  the 
swept  area  surrounding  the  radar  location,  displays  immediately  the  radio 
line-of-sight  paths*  Such  displays  of  the  scattering  areas  indicate  all 
prospective  communication  paths  between  the  location  of  the  radar  and  the 
areas  producing  the  scatter*  The  returned  power  from  the  scattering  areas 
shown  on  the  contour  map  must  be  correlated  with  the  system  gain  of  the 
microwave  communication  equipment  to  be  employed*  Information  relative  to 
the  actual  path  transmission  loss  between  the  looation  of  the  radar  and  any 
point  in  those  areas  producing  the  scatter,  would  definitely  determine  the 
feasibility  of  a  prospective  communication  site*  This  information  is  not 
obtainable  from  the  photograph  and  would  necessitate  an  aotual  path  trans¬ 
mission  loss  measurement  between  the  looation  of  the  radar  and  the  parti¬ 
cular  point  in  the  areas  producing  the  soatter.  This  is  not  feasible  for 
the  Intended  application  of  the  above  mapping  technique  for  the  siting  of 
short  range  microwave  communication  equipment  with  fifteen  foot  high  an¬ 
tennas* 

DISCUSSION t  The  actual  path  transmission  loss  for  the  microwave  fre¬ 
quency,  to  be  used  is  the  sum  of  the  free  space  path  transmission  loss 
detennlned  by  the  distance  between  the  radar  looation  and  the  proposed  com¬ 
munication  site,  and  the  terrain  faotor  power  loss  determined  by  the  type 
of  terrain  along  the  communication  path*  There  1b  not  available  to  date 
sufficient  data  that  would  correlate  the  type  of  terrain  cf  the  communi¬ 
cation  path  with  the  terrain  factor  power  loss.  Were  suoh  a  oorrelatiin  ■ 
available,  then,  from  such  a  map  overlay  as  shown  in  Fig.  l*and  with  a 
knowledge  of  the  type  of  terrain  of  the  communication  path,  the  feasibility 
of  establishing  communication  over  the  path  involved  oould  be  readily 
determined*  We  are  now  concerned  with  the  obtaining  of  such  data  and  the 
best  moans  of  establishing  such  a  correlation,  If  It  exists,  from  the 
experimental  data  obtained  to  date.  This,  it  can  be  readily  seen,  will 
not  be  an  easy  taek  when  one  considers  the  many  types  of  terrain  that  can 
be  involved  and  the  magnitude  of  the  contribution  to  the  communication 
path  transmission  loss  of  the  terrain  faotor,  particula ry  as  it  is 
affected  by  the  terrain  in  the  immediate  vicinity  of  the  transmitting  and 
reoelving  sites* 

In  addition  to  the  photographing  of  the  soatter  pattern  observable  on 
the  radar  scope,  the  received  scattered  pulse  amplitude  from  the  area  at 
the  desired  communications  site  is  compared  to  the  radar  transmitted  pulse 
amplitude.  The  received  amplitude  scattered  by  a  given  area  is  from  many 
scatterers  comprising  the  area.  The  reoeived  amplitude  from  the  scattering 
area  is  oompared  to  the  radar  transmitted  amplitude.  From  a  measurement 
the  ratio  of  the  receiver  input  scattered  return  average  power  to  the  radar 
transmitter  average  power  output  expressed  in  db  is  obtained. 

*  Figures  appear  at  end  of  the  artiole. 
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Goldstein  has  shown  in  the  hook  "Propagation  of  Short  Radio  Waves"  of 
the  MIT  Series  that  the  total  average  radar  received  scattered  signal  power 
summed  up  for  many  soatterere  in  the  target  area,  where  the  same  antenna 
1b  employed  for  transmitting  and  receiving,  is  the  following  designated 
equation  (!)• 

_  />.  \  \Z  /’<3  .  1  .  O 


5  (*as‘ 


In  thie  expression,  G  is  the  maximum  antenna  power  gain,  the- first  faotor 
under  the  summation  sign  is  the  antenna  pattern  function,  the  second  factor 
ie  the  free  epaoe  power  loss  reforred  to  a  doublet  radiator,'  the  third 
faotor  is  a  measure  of  the  scattered  power  ah  a  function  of  o .  -  the  • 
scatter  cross  section  of  the  "jth"  soatterer,  and  the  last  factor  under 
the  summation  sign  is  proportional  to  the  magnitude  of  the  Poyntlng  vector 
of  the  incident  wave  at  Rj  at  such  a  time  that  the  reflected  echo  from  the 
"jth*1  soatterer  returns  to  the  radar  at  the,  instant  of  time'  t0*  At1  those  •> 
distances  from  the  radar  to  the  target  area1 where  it  oan  be  assumed  that 
the  sum  in  equation  (1;  involves  a  very  large  number  of  soatterere,  the 
summation  may  be  replaced  by  tn  integral  where 


ie  the  density  funotion  whioh  gives  the  number  of  soatterere  in  an  area,1., 
element  R  dR  df>  for  whioh  the  radar  cross  seotion  lisa  between  o  and  '♦  dJ, 
and  where  it  can  be  further  assumed  that  the  soatterere  are  distributed 
uniformly  and  homogenously  over  the  target  ersa  eo  that  the  funotion  N  Is 
only  a  funotion  of  5*  and  the  free  epees  power  less  IS  independent  of  Rj> 
then  equation  (l)  beoomeB  equation  \2)*  ' 
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In  the  expression  designated  as  equation  (2),  Pr  is  the  total  average 
radar  received  scattered  power.  The  firBt  factor  on  the  right  ie  the  free 
space  path  power  loss,  the  second. faotor  is  the  combined  power  gain  of  the 
transmitting  and  receiving  antenna  modified  by  the  antenna  pattern  over  the 
target  area,  the  third  faotor  is  a  measure  of  the  scattered  power  from  the 
target  area,  and  the  fourth  factor  ie  the  transmitter  power  output*  Now 
equation  (2;  holds  approximately  for  distances  in  excess  of  six  miles*  As 
the  distances  Increase,  the  more  accurate  equation  (2)  becomes.  For  dis¬ 
tances  less  than  six  miles  equation  (2)  does  not  hold,  and  equation  (l) 
involving  the  summation  from  individual  scatterersnruet  be  employed.  If  the 
radar  transmission  path  is  over  a  terrain,  then  a  two-way  terrain  lose 
factor  must  be  added. 

Hence  for  distances  in  excess  of  six  miles,  it  is  approximately 
acourate  that  the  ratio  of  the  radar  receiver  input  target  area  scattered 


m  m  m  •.*  #_  #  *  .  • 


Design  cf  Experiments 


215 


1 


’*1 


power  to  the  radar  transmitter  power  output  expressed  in  db  is  equal  to 
combined  power  gain  of  the  transmitting  and  receiving  antenna  modified  by 
the  antenna  pattern  over  the  target  area  expressed  in  db,  plus  the  free 
space  power  loss  oppressed  in  db  for  the  distanoe  from  the  radar  set  to 
the  target  area,  plus  the  terrain  factor  power  loss  oppressed  in  db  for 
the  transmission  path  from  the  radar  set  to  the  target  area  and  return 
path  to  the  radar  set,  plus  a  loss  expressed  in  db  which  is  a  measure  of 
the  soattcred  power  from  a  selected  target  area.  That  is  in  equation  (3). 

(3)  10  Log  PR  -  10  Log/  O2/  ftHe,  h)  it  *  10  Log  A2  ♦  10  Log  kj*  *  10  Log  X. 

*;  LM  1 J 

is  the  radar  transmitter  average  power  output,  is  the  radar 
receiver  average  input  target  scattered  power.  The  first  term  on  the 
right* is  the  combined  power  gain  of  the  transmitting  and  reoeivlng  antennae 
modified  by  the  antenna  pattern  over  the  target  area  expressed  in  db#  The 
second  term  is  the  free  spaoe  power  loss  referred  to  a  doublet  radiator 
expressed  In  db.  The  third  term  is  the  forward  and  return  terrain  factor 
power  lose  expressed  in  db#  The  fourth  term  is  4  measure  of  the  target 
area  scattered  power  expressed  in  db* 

If  a  correlation  can  be  obtained  between  the  communication  pith 
terrain  power  factor  loss  and  the  various  typeacf  terrain  of  the  communica¬ 
tion  paths;  then  with  the  aid  of  the  radar  power  ratio  measurement  a 
correlation  between  the  type  of  terrain  along  the  oommuni  oettion  path 
and  the  radar  power  ratio  neasuremnet  may  be  obtainable#  Fig,  2  is  an 
example  of  how  the  experimental  data  is  presently  summarized.  In  'the 
twelve  rows  are  the  results  of  twelve  field  measurements.  Column  1  Is  the 
path  length  in  miles j  column  2  is  the , calculated  free  space  path  trans¬ 
mission  lose  1$  (Eq.  3);  column  3  ic  the  measured  terrain  loss  factor 
(the  measured  path  loss  from  column  h  minus  the  free  space  loss,  column  2)* 
Column  £  is  the  ratio  of  power  level  received  to  power  level  transmitted  1 
by  radar; ’ in  column  6,  Xu  Is  a  measure  of  scattered  power  from  the  target 
area.  Column  7  is  for  the  terrain  type  classification.  The  problem 
submitted  is  the  need  for  method  of  terrain  classification  that  will  permit 
a  predetermination  ,of  the  path  transmission  lose  from  the  physical  aspects 
of  the  terrain.  Column  8  is  the  relative  heights  of  the  selected  sites# 
another  factor  believed  to  be  an  important  consideration  for  the  predic¬ 
tion  of  the  terrain  loss  factor# 
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EXPERIMENTAL  DESIGNS  FOR  ORGANISATION  RESEARCH 

USING  LIMITED  RESOURCES 


Raymond  H.  Burros 
Combat  Operations  Research  Group 
Ft.  Monroe,  Va, 

The  title  of  this  paper  is  somewhat  misleading,  since  I  do  not  intend 
to  discuss  specific  details  of  experimental  design.  Instead  I  shall  first 
present  a  methodological  problem  growing  out  of  limitations  in  resources 
available  for  experimental  research  in  military  organization.  Then  I  shall 
present  some  possible  approaches  to  the  solution  of  the  problem  without 
going  into  details  of  experimental  design.  In  a  sense,  therefore,  the 
discussion  will  deal  with  classes  of  designs.  Some  of  the  more  crucial 
assumptions  will  be  examined.  I  shall  conclude  the  discussion  by  presenting 
a  possible  approach  which  may  lead  some  of  you  into  some  new  lines  of 
thinking. 


SOME  CHARACTERISTICS  OF  ORGANIZATION  RESEARCH 

Research  in  human  organization  has  at  least  two  important  character¬ 
istics  distinguishing  it  from  research  on  individual  organisms,  human  or 
otherwise.  First,  the  experimental  unit  is  not  the  single  human  being; 
it  is  a  specified  kind  of  human  group,  such  as  an  infantry  platoon. 

Second,  the  group  score  is  frequently  obtained  by  observing  the  behavior 
of  the  group,  but  not  neoessarily  the  detailed  behavior  of  each  member  of 
the  group.  In  other  words,  the  group  score  is  often  not  simply  the  sum 
or  mean  of  the  scores  of  the  members  of  the  group,  although  these  members 
help  to  determine  the  group  score. 

These  characteristics  imply  that  a  fairly  large  number  of  subjects 
is  needed  to  gather  data  on  a  relatively  small  number  of  types  of 
organization.  The  limitation  on  number  of  troops  available  for  use  as 
subjects  is  most  pressing.  Other  types  of  limited  resources  include 
terrain  and  equipment. 

To  make  the  problem  more  concrete,  let  us  make  some  specific 
assumptions.  First,  we  have  available  a  regimental  combat  team,  i.e., 
the  equivalent  of  an  infantry  regiment  with  additional  supporting  weapons 
units.  This  will  provide  27  rifle  platoons  of  the  present  size  with  other 
weapons  units.  Second,  different  sizes  and  structures  of  the  infantry 
platoon  provide  the  independent  variables,  and  various  measures  of  effec¬ 
tiveness  are  the  dependent  variables.  Third,  some  of  the  experimental 
organizations  will  demand  more  enlisted  men  than  does  the  present  day 
platoon.  The  problem  is  to  choose  an  approach  to  experimental  design 
which  will  take  ac  unt  of  resource  limitations  and  still  be  powerful 
enough  to  detect  reasonably  important  differences. 

POSSIBLE  APPROACHES  TO  SOLUTION 


The  first  approach  is  to  assign  at  random  some  of  the  27  existing 
platoon t  to  the  various  treatments  (organization  structures).  The 
member a  of  the  remaining  platoons  are  used  to  augment  those  platoons 
wnich  require  more  than  the  presently  allocated  strength.  This  gives  us 
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(say)  a  total  sample  of  twenty  experimental  platoons.  Because  of  the  great 
variability  of  group  scores,  however,  this  approach  is  probably  not  powerful 
enough  to  detect  differences  between  as  few  as  four  types  or  organization. 

The  second  approach  is  to  use  depleted  or  "skeletonized"  military  units 
These  would  have  full  complements  of  commissioned  and  non-commissioned 
officers  but  would  simulate  the  existence  of  most  6f  the  enlisted  men* 
Although  there  may  be  some  possibility  of  doing  this*  it  would  still  be 
necessary  to  validate  the  methodology  by  means  of  experiments  with  complete 
military  units.  Therefore,  this  approach  does  not  solve  the  full  problem. 

The  third  approach  is  to  take  a  number  of  platoons  and  run  each  under 
all  of  the  treatments,  when  the  number  of  these  is  small.  This  approach 
assumes  that  an  existing  platoon  preserves  its  essential  identity  even 
though  noncommissioned  officers  and  enlisted  men  are  randomly  added  to  it 
or  removed  from  it  to  fit  the  structure  prescribed  by  the  experimental  treat 
ment.  Then  each  of  twenty  existing  platoons  can  be  run  under  all  of  the 
treatments  when  the  number  of  treatments  is  small. 

An  adequate  design  for  this  approach  will  have  to  control  for  two  kinds 
of  order t  firet,  the  order  in  whioh  the  treatmente  are  applied  to  each 
platoon,  and  eeoond,  the  order  in  which  the  platoons  are  tested.  If  there 
le  no  reason  to  expect  that  either  order  Interacts  with  treatment,  then 
several  kinds  of  experimental  designs  can  be  applied.  There  is  good  reason, 
however,  to  expect  interaction  between  treatments  and  the  orders  in  which 
they  are  applied  to  the  platoons.  Presumably  once  a  platoon  has  learned 
to  function  under  one  organization,  this  learning  may  either  facilitate  or 
inhibit  its  performance  under  a  different  organization.  Psychologiots 
recognize  this  as  the  process  of  positive  or  negative  transfer.  Our 
knowledge  of  transfer  is  not  adequate  enough  to  predict  exaotly  what  will 
happen.  It  is  sufficient,  however,  to  Justify  my  assertion  that  inter¬ 
action  is  likely  to  be  both  present  and  large.  If  this  is  so,  then  such  an 
approach  will  not  yield  trustworthy  conclusions  about  the  relative  effective 
ness  of  different  kinds  of  organization.  Although  I  do  not  mean  to  assert 
that  this  approach  of  applying  all  treatments  to  each  platoon  is  hopeless, 
it  may  be  worthwhile  to  consider  another  approach. 

The  fourth  and  last  approach  to  be  considered  is  somewhat  radical. 
Whenever  the  spaces  in  a  table  of  organization  are  to  be  filled  to  provide 
A  replication  for  any  treatment,  each  space  is  filled  at  random  from  all 
of  the  available  qualified  personnel.  In  other  words,  there  would  be 
random  sampling  with  replacement  from  a  stratified  finite  population.  It 
would  happen,  therefore,  that  a  given  subject  would  serve  in  a  number  of 
experimental  military  units  during  his  participation  in  the  experimental 
program.  He  would  contribute  to  the  effectiveness  score  of  a  replication 
of  several,  perhaps  of  all,  the  experimental  treatments.  He  might  help  to 
determine  the  score  of  more  than  one  replication  of  a  given  treatment.  In 
this  approach  the  usual  techniques  of  analysis  of  variance  for  designs  not 
involving  more  than  one  measurement  per  experimental  unit  would  be  applied 
if  they  are  applicable.  If  this  approach  is  legitimate,  it  may  be  the  best 
solution,  especially  if  we  desire  to  use  a  factorial  design  with  an 
appreciable  number  cf  subgroups  and  of  replications. 
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lhe  major  question  about  this  approach  is  the  legitimacy  of  assuming 
that  the  error  components  of  all  the  scores  are  independent.  The  reason 
for  making  this  assumption  is  the  possibility  that  a  person's  behavior  is 
strongly  influenced  by  the  behavior  of  the  other  members  of  the  small  group 
in  which  he  participates.  Even  if  a  person  contributes  to  a  number  of  group 
scores*  his  contribution  will  be  made  under  different  conditions.  A  man 
may  be  highly  cooperative  when  working  as  a  member  of  one  rifle  squad  and 
rather  uncooperative  when  he  is  put  into  another  squad.  If  his  behavior  is 
not  consistent  under  all  conditions,  then  the  fact  that  he  helps  to  deter¬ 
mine  more  than  one  group  score  may  not  necessarily  force  the  error  components 
of  the  scores  to  be  correlated. 

The  argument  against  the  asaumption  of  independence  of  errors  lies  in 
the  fact  that  under  certain  circumstances  behavior  is  remarkably  consistent. 
For  example,  suppose  that  the  members  of  a  rifle  platoon  are  firing  at 
targets  in  a  situation  in  which  the  total  number  of  hits  can  be  recorded 
but  the  hits  can  not  be  credited  to  particular  riflemen.  Here  the  group 
score  is  the  sum  of  the  individual  soores  even  though  the  latter  are  not 
themselves  recorded.  Since  the  number  of  hits  made  by  a  given  person  is 
nearly  constant  from  time  to  time,  and  there  are  great  individual  differ¬ 
ences  in  this,  the  group  soores  will  be  statistically  dependent  whenever 
they  are  partly  determined  by  the  same  people. 

Suppose  now  that  a  mathematical  model  for  the  group  score  Is  set  up 
which  breaks  it  down  into  components  in  preparation  for  an  analysis  of 
variance.  These  components  have  no  simple  relationship  to  the  individual 
components  mentioned  earlier.  Now  if  the  treatments  corresponding  to  two 
scores  are  different  and  thair  error  components  are  independent,  then  the 
scores  are  independent.  Suppose,  however,  that  the  scores  are  dependent 
because  of  re-use  of  some  subjects.  Then  it  is  false  that  both  the 
treatments  are  different  and  the  error  components  are  independent.  But 
by  hypothesis,  the  treatments  are  different.  Therefore  the  error  components 
are  dependent. 

In  other  words,  although  sometimes  there  is  some  reason  to  hope  that 
the  error  components  of  the  group  scores  are  almost  independent  when  the 
subjects  are  used  more  than  once,  there  is  often  good  reason  to  expect 
dependence.  If  this  is  so  then  I  can  imagine  only  two  ways  to  proceed. 

The  first  is  to  derive  a  new  mathematical  model  which  will  allow  re¬ 
use  of  personnel  reassigned  by  stratified  random  sampling  to  form  new 
experimental  units. 

The  second  way  is  to  determine  the  relationship  between  levels  of 
significance  oleimed  by  the  use  of  conventional  analysis  of  variance  and 
the  true  levels  of  significance.  Perhaps  a  Monte  Carlo  approach  may  be 
useful  here.  Finally,  there  may  be  other  alternatives  which  I  have  not 
thought  of. 

The  thesis  of  this  paper  may  now  be  summarized.  Experimental  research 
on  military  units  is  faced  with  a  serious  limitation  on  the  number  of  subjects 
available.  There  is  some  question  about  the  adequacy  of  conventional 


224 


Design  of  Experiments 


experimental  designs.  Your  help,  therefore,  is  solicited  in  two  respects* 
First,  you  may  be  able  to  make  suggestions  about  the  use  of  already  avail¬ 
able  designs*  Seoond,  if  all  existing  designs  are  in  some  sense  inadequate, 
you  may  become  interested  in  the  problem,  either  to  work  on  it  yourself  or' 
to  encourage  others  to  do  so. 

It  is  necessary  that  research  be  done  on  the  organization  of  military 
units*  Unless  adequate  experimental  designs  are  available,  however,  there 
is  danger  that  the  experimental  evidence  may  not  be  sufficient  to  justify 
conclusions  drawn  from  the  data.  Your  help  on  this  problem  will  be,  I 
believe,  a  worthwhile  contribution. 


PROBLEMS  IN  ARMY  FIELD  EXPjlRBUSNTATION 


Lt.  Col.  W.  L,  Clement 
Military  Advisor,  ORO 

An  atmosphere  of  urgency  and  timeliness  surrounds  all  Army  testing 
and  experimenting  today.  As  a  result  we  find  test  directives  which  are 
ambitious  in  scope  -  having  several  objectives  -  which  are  on  a  large 
scale,  encompassing  divisions  and  corps,  and  which  set  an  extremely  short 
time  limit  in  which  to  come  up  with  firm  answers.  Under  these  circum¬ 
stances  it  is  not  surprising  that  sometimes  the  answers  are  not  good. 

I  am  going  to  talk  today  about  some  of  the  problems  whioh  arise  in 
this  general  area  of  teste  and  experiments  -  a  related  activity  -  and 
raise  some  questions  for  later  discussion. 

In  the  first  place,  Army  teeters  and  experimenters  are  usually  not 
statisticians  or  experts  in  experimental  design.  Some  of  the  problems 
arise  from  this  fact.  However,  even  when  the  Arrgjr  man  turns  to  the 
literature  on  these  subjects,  he  quickly  becomes  engulfed  in  such  unfamiliar 
terms  ae  "correlation,"  "random  variability,"  "independent  variables," 
"regression  coefficients,"  and  the  like.  And  the  examples  he  finds  apply 
to  such  things  as  roller  bearings,  hogs,  and  corn  plants.  In  very  few 
places  can  he  find  literature  which  uses  his  terms  and  his  problems  - 
weapons,  units,  mobility,  training,  and  the  like  -  and  even  here  a  very 
close  serarch  is  needed.  Small  wonder,  then,  that  an  appreciation  of  valid 
testing  is  not  readily  apparent. 

,  The  first  problem  then  seems  to  be  one  of  communication  -  to  relate 
these  agricultural  and  industrial  techniques  to  military  operations  and 
problems . 

Apart  from  the  general  atmosphere  of  urgency  and  the  need  for  timely 
answere,  Army  teeters  operate  under  three  general  principles,  pointed  out 
by  Dr.  Meals  of  00 RQ  in  a  recent  paper* 

1.  Tests  must  be  economical. 

2.  Measurements  must  be  valid  and  reliable. 

3.  Tests  must  be  realistic. 

These  three  principles  represent  three  problem  areas  in  themselves.  Number 
3,  achieving  realism,  is  one  of  the  most  difficult. 

So  much  for  general  problems.  I  will  now  get  to  some  more  specific 
matters  -  three  in  fact.  One  is  tests  of  an  Army  combat  unit*  two,  con¬ 
trollability  of  this  unitj  and  three,  mobility  of  the  same  unit. 

First,  testing  (not  experimenting  with)  the  T/O  E  (Table  of  Organization 
and  Equipment)  of  a  combat  unit.  As  an  example,  let  vs  consider  a  tank 
battalion,  the  problem  being  to  test  it  and  determine  its  effectiveness. 

Let's  see  what  the  announced  mission  of  this  unit  is,  as  shown  in  the 

T/C&E  j 
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"To  close  with  and  destroy  enemy  forces,  using  fire,  maneuver, 
and  shock  action  in  coordination  with  other  arms." 

The  capabilities  are  also  listed,  some  of  which  are* 

"Attack  or  counterattack  under  hostile  fire." 

"Destruction  of  enemy  armor  by  fire." 

•High  cross-country  mobility",  etc. 

I  think,  as  testers,  we  are  immediately  struck  with  the  lack  of  any 
quantitative  terms  in  description  of  missions  and  capabilities.  The 
problem  becomes  how  to  translate  these  terms  into  measpred  performance  in 
the  field.  Major  weaknesses  in  currsnt  tests  can  be  traced  to  the  method 
and  type  of  measurements  taken  -  data  collected  -  and  to  the  lack  of 
realism,  as  mentioned  earlier* 

To  expand  a  bit  on  these  weaknesses,  most  of  the  ratings  given  a 
unit  are  subjective.  Umpires  are  used  freely,  and  unfortunately  they 
generally  interpret  rather  than  describe  what  has  occurred.  Here  are  some 
typical  examples  of  items  which  an  umpire  is  called  on  to  rate  in  a  ourrent 
training  test* 

"Was  reconnaissance  adequate?" 

"Did  the  commander  employ  his  staff  properly?" 

"Were  control  measures  adequate?" 

"Disposition  and  control  of  vehicles." 

"becreoy  measures. " 

With  these  items  as  a  guide,  it  is  certainly  difficult  for  tho  umpire  to 
be  objective  in  rating. 

Achievement  of  realism  is  another  problem.  Some  work  is  currently  going 
on  in  developing  devices  which  simulate  aspects  of  combat  closely.  Thus, 
instead  of  having  to  rely  on  umpire  decisions,  the  situation  is  somewhat 
realistically  portrayed  on  the  ground.  There  is  much  work  to  be  done  in 
this  area  -  how  to  create  a  combat  atmosphere  throughout  the  test. 

Let's  now  look  further  at  the  T/QtE  of  this  battalion.  Psychological 
Research  Associates,  in  their  work  with  the  rifle  squad  organization,  listed 
these  as  the  categories  of  factors  which  make  up  a  T/C8cE*  To  briefly  run 
through  the  chart,  then* 
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Controlled 

Variables 


Training 

Pare 

Capabilities 

Leadership 


Dependent 

Variables 


Controlla- 
ability 
Fire  Delivery 
Supply 
Mobility 
eto 


Field 
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Column  1  lists  the  T/Q&E  components  in  which  we  are  interested  - 
the  independent  variables  which  the  test  designer  is  familiar  with. 

Column  2  lists  other  ohar&oterietics  which  will  affect  the  test*  and 
which  must  be  controlled. 

Column  3  shows  what  we  are  trying  to  measure  -  desirable  character¬ 
istics^  or  dependent  variables. 

Column  U  is  reserved  for  the  actual  problems  or  exercise  which  are 
set  up  to  measure  3,  and  to  brlrg  out  the  differences. 

It  would  seem  that  at  present  in  Army  tests  we  hold  Column  1  constant, 
combine  2  and  3,  and  determine  the  outcome  In  h*  We  are  never  really  sure 
of  what  in  Columns  1,  2,  and  3  determined  the  outcome  in  U. 

A  first  order  of  business,  before  launching  into  extensive  experimen¬ 
tation,  is,  then,  to  develop  methods  by  vhioh  effectiveness  of  existing 
units  oan  be  measured  more  accurately  than  at  present.  Techniques,  gad¬ 
gets,  and  procedures  developed  in  testing  can  be  directly  applied  to 
experimental  work  later.  And  the  present  series  of  Army  training  tests, 
which  units  are  subjected  to  annually,  offer  a  ready-made  framework  for 
the  tester  to  use. 

The  second  specific  problem  has  to  do  with  an  experiment  to  measure 
controllability  of  this  battalion  -  listed  as  a  dependent  variable,  or 
desirable  characteristic  in  Column  3»  In  order  to  experiment,  then,  we  are 
going  to  vary  the  independent  variables  in  Column  1,  control  those  in 
Column  2,  and  observe  and  measure  controllability  in  Column  3  through  tests 
which  we  will  show  in  Column  1; . 

Now  to  define  controllability.  The  commander' s  control  duties  can  be 
divided  into  two  major  elements*  1.  He  plans  and  decides.  2.  He  has  the 
unit  execute  the  plan. 
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The  gap  between  1  and  2  ie  bridged  by  control  -  by  the  commander1 s 
communicating  and  supervising .  and  these  latter  are  the  factors  to  be 
measured*  In  other  words,  we  will  have  a  series  of  tests  to  measure 
communication,  and  another  series  to  measure  supervision. 

Sij&e,  in  Column  1,  is  the  first  independent  variable  we  will  consider* 

X  propose  to  vary  size  and  hold  composition  and  equipment  constant,  while 
aeasuring  controllability}  then  we  will  vary  the  other  independent 
variables  in  turn* 

An  immediate  question  might  well  be  in  the  interests  of  economy  and 
timet  should  we  not  vary  all  three  simultaneously?  If  so,  in  the  field 
can  we  practically  control  these  variations  so  that  we  know  what  has 
affected  the  outcome? 

Another  question!  at  what  eohelon  in  the  chain  of  command  will  we  stop 
-  at  company  or  platoon?  Mr.  Eckles,  a  member  of  our  Armor  Group  here 
at  0R0,  has  pointed  out  that  battalion  commanders  actually  c ontrol  platoons 
in  many  cases;  company  commanders  act  as  message  centers  In  some  canes, 
transmitting  the  battalion  commander' b  orderB  to  the  platoons*  This 
should  not  imply  that  the  ohain  of  command  is  violated*  It  does  suggest, 
however,  that  battalion  commander's  control  duties  do  not  stop  at  company 
level*  As  a  matter  of  fact,  I  recall  a  enrt  of  rule  of  thumb  in  the 
Army  to  the  effect  that  oommanders  should  generally  be  concerned  with  the 
second  echelon  below  their  level*  In  other  words,  division  commanders 
concern  themselves  with  battalions,  and  battalion  commanders  concern  them¬ 
selves  with  platoons.  This,  then,  ie  a  point  which  must  be  settled  be¬ 
fore  proceedings  with  our  experiment* 

What  range  of  eizea  do  ve  test,  and  how  is  this  determined?  What 
are  the  upper  and  lower  limits  -  between  10  companies  and  2  companies  for 
example?  We  probably  can  arrive  at  a  logical,  practical  range  of  elaes  by 
querying  experienced  military  people* 

How  many  battalions  are  needed?  Can  we  use  only  command  echelons,  or 
do  we  need  the  entire  unit?  Must  we  proceed  through  platoon  and  company 
tests  firs,t  before  going  to  battalion  level,  or  can  useful  anewere  be 
obtained  by  approximating  performance  at  the  lower  levels?  These  are 
very  practical,  and  economical,  considerations  from  the  Army  point  of  view* 

Now  let' s  turn  to  Column  2,  our  controlled  variables.  How  can  these 
actually  be  taken  into  account  and  controlled?  How  can  we  arrive  at  mean¬ 
ingful  results  which  could  be  applicable  to  the  various  battalions  which 
exist  today  in  our  many  armored  units?  What  is  the  standard  for  training, 
discipline,  and  leadership,  and  how  will  our  experimenter  arrive  at  this 
so  that  he  oan  apply  his  results  univer sally? 

In  Columns  3  and  li  we  consider  test  designs  which  measure  our  dependent 
variable  and  bring  out  difference  a  resulting  i’rom  changeB  in  the  independent 
variables*  These  performance  tests  should  be  based  on  ciltical  situations 
whioh  will  bring  out  these  differences.  Again,  military  opinion  is  probably 
the  best  source  for  arriving  at  these  critical  situations. 
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We  know  that  our  experiment  must  be  valid;  that  is,  should  measure 
what  we  actually  trying  to  measure.  It  should  be  standard,  so  that  all 
groups  participating  are  graded  under  the  same  conditions*  Scoring  should 
be  accurate  and  objective,  which  suggests  devices  of  some  type,  together 
with  properly  instructed  umpires.  Our  eooring  indices  must  be  carefully 
planned,  so  that  they  actually  gauge  the  performance  witnesses.  For 
example,  in  communication,  percentage  of  critical  words  heard  might  be  an 
index;  percentage  af  errors  might  be  an  index  to  measure  performance. 

So  mutih  for  a  brief  discussion  of  some  of  the  problems  which  arise  in 
considering  rn  experiment  to  measure  controllability.  In  order  to  be 
oertain  that  we  trigger  some  response  from  the  audience,  let's  look  at 
another  experiment.  This  time  we  are  interested  in  measuring  mobility  of 
our  battalion. 

The  aspect  of  mobility  with  whioh  we  are  concerned  here  is  vehicular 
operability;  the  unit  ie  as  mobile  as  the  number  of  tanks  which  it  keeps 
in  operation.  This  implies  that  we  must  consider  organization  and 
equipment  used  to  keep  the  vehicles  running,  ae  well  as  the  vehicles  them¬ 
selves. 

Actually,  at  present,  tank  performance  ie  indirectly  reflected  in  the 
number  of  meohanics  needed  in  a  unit.  A  broad  average  has  been  taken  of 
tank  performance,  and  a  "vehicle  equivalent"  has  been  arrived  at  whioh  by 
rule  of  thumb  allocates  so  many  mechanics  for  so  many  tanks.  Actually, 
vechile  equivalents  are  used  in  drawing  up  T/O&S'  e  of  all  units  having 
vehicles  of  any  type. 

We  intent,  therefore,  to  investigate  this  vehiole  equivalent  to 
determine  in  what  situations  it  does  apply  and  what  the  limiting  situations 

are. 


Again,  turning  to  Column  1  of  our  table,  we  intend  to  vary  size,  here 
meaning  number  of  mechanics.  Some  of  the  same  questions  arise  as  before. 
What  range  of  sizes?  Should  we  vary  the  other  independents  simultaneously? 
What  participating  troops  are  needed?  How  many  battalions,  if  any? 

Looking  at  Column  2,  how  do  ws  take  into  account  skills,  equipment, 
terrain,  weather,  type  of  operation,  condition  and  age  of  vehicles,  at 
the  start  of  our  experiment?  How  do  we  relate  our  results  bo  the  real 
world  of  battalions  spread  from  Europe  to  Korea? 

In  Columns  3  and  ).;  we  should  include  situations  which  measure  and 
discriminate  between  performance  of  vehicles,  tools,  and  mechanics  - 
critical  situations.  It  would  seem  that  3  series  of  "canned"  troubles 
might  be  built  into  our  experiment,  built  up  realistically  from  data  on 
failure  frequences. 

What  measurement  constitutes  an  index  of  performance?  Perhaps  time 
would  be  the  best  indicator. 


230 


Design  of  Experiments 


Having  asked  many  questions  and  posed  several  problems,  I  will  con¬ 
clude  this  brief  discussion.  Perhaps  our  problems  can  be  summed  up  generally 
in  the  areas  of  (l)  Communications  -  understanding  experimental  design 
principles*  (2)  Economy.  (3)  Valid  aad  reliable  measurements*  (it)  Real¬ 
ism* 


EVALUATION  OF  IN TERIABORA TORY  TESTS 
WITH  LIMITED  CONTROLS  AND  DATA 

W.  K.  Murray 

Watertown  Arsenal  Laboratories 

The  following  discussion  concerns  the  problem  of  the  proper  evalua¬ 
tion  of  data  received  in  connection  with  some  interlaboratory  determina¬ 
tions  of  oxygen  in  titanium  alloys.  The  problem  is  complicated  by  the 
difficulty  of  achieving  proper  statistical  control  of  the  experiment 
when  the  data  is  obtained  by  voluntary  cooperation  of  a  number  of  labora¬ 
tories,  each  of  which  differs  normally,  to  some  degree,  in  its  methods 
and  procedures.  The  difficulties  which  have  arisen  in  this  problem  are 
by  no  means  unique,  but  are  common  to  moat  interlaboratory  evaluation  . 
problems.  It  is  felt  that  a  solution  of  some  of  the  questions  arising 
from  this  specific  problem  would  have  general  application. 

The  background  of  the  specific  problem  is  as  follows! 

Sinoe  the  use  of  titanium  has  developed  only  recently,  there  have 
been  no  standard  accepted  methods  for  its  chemical  analysis.  In  order 
to  provide  generally  acceptable  methods,  a  Panel  on  Methods  of  Analysis 
has  been  set  up  to  investigate  methods  for  the  determination  of  each 
alloying  element  or  impurity  and  to  recommend  suitable  analytical  pro¬ 
cedures*  In  the  case  of  most  elements,  prooedures  have  been  developed, 
tested  by  a  number  of  cooperating  laboratories  and  found  to  be  quite 
satisfactory  with  regard  to  precision  and  accuracy. 

In  the  determination  of  oxygen  in  titanium,  however,  no  procedure 
has  yet  been  adopted  and  recommended  for  general  use.  One  reason  for 
this  Is  that  there  are  no  standard  specimens  available  containing  known 
amounts  of  oxygen  against  which  procedures  can  be  tested. 

As  a  preliminary  investigation,  it  was  decided  to  limit  our  analysis 
to  two  general  sources  of  variations  that  due  to  the  samples  and  that 
due  to  the  laboratories.  It  is  believed  that,  if  we  can  show  inter¬ 
laboratory  differences  to  be  the  significant  source  of  variation,  our 
problem  would  be  reduced  to  a  study  of  laboratory  methods. 

The  samples  consisted  of  commercial  titanium  and  titanium  alloys 
available  in  stock,  thus  eliminating  any  control  over  their  preparation. 
The  cutting  of  the  original  material  and  randomizing  of  the  samples  for 
distribution  to  the  different  laboratories  is  the  first  control  we  are 
able  to  exercise  over  the  samples  in  this  design.  The  samples  were  dis¬ 
tributed  to  the  cooperating  laboratories,  who  were  requested  to  make  four 
determinations  for  each  titanium  alloy  using  one  or  both  of  two  suggested 
methods*  the  number  of  determinations  were  restricted  due  to  the  cost 
involved.  Homogeneity  of  the  sample  being  unknown,  we  attempted  by 
randomization  to  reduce  the  influence  of  oxygen  segregation  in  the 
samples. 
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The  hypotheses  we  wish  to  test  ares  (l)  there  is  no  within-sample 
variation]  (2)  there  is  no  be tween-laboratory  variation]  and  (3)  the  two 
methods  tested  give  similar  results.  The  purpose  of  this  study  is  to 
determine  whether  the  differenoe  in  results  is  due  to  differences  among 
laboratories  or  to  segregation  in  the  titanium  sanplesj  and,  if  possible, 
to  determine  whether  a  technique  for  determining  oxygen  in  titanium  is 
suitable  for  recommendation  as  an  acceptable  procedure. 

After  this  general  statement  of  the  problem,  we  should  like  to 
mention  some  of  the' specific  questions  which  have  arisen tand  which  must 
be  resolved  if  a  logical  statistical  approach  is  to  be  utilized. 

Preliminary  to  any  statistical  analysis  one  must  handle  the  question 
of  rejecting  data.  In  an  experiment  such  as  this  one,  which  is  to  some 
degree  uncontrolled,  this  is  an  important  point.  Certain  laboratories  are 
personally  known  to  be  more  reliable  than  others  by  virtue  of  better 
equipment,  more  experience  and  other  factors.  Can  one  give  more  weight 
to  the  results  of  these  laboratories  than  the  others  and  still  avoid 
biasing  the  results  by  personal  pradjudioes?  In  our  ease,  it  is  very 
tempting  to  eliminate  the  results  of  about  half  of  the  thirteen  coopera¬ 
ting  laboratories.  Previous  experience  has  indicated  that  there  is  a 
group  of  laboratories  whose  work  is  more  reliable  tlian  the  others.  These 
laboratories,  in  this  testing  program,  agreed  with  each  other  much  more 
closely  than  did  the  other  labcratories.  Yet,  on  purely  statistical 
grounds,  there  is  no  reason  to  eliminate  more  than  one  laboratory  on 
the  basis  of  the  results  received. 

Another  question  oonoerns  the  analysis  of  data  gathered  employing 
two  different  analytical  procedures  in  the  same  laboratory  or  in  different 
laboratories.  Should  the  methods  be  ooraparad  on  a  laboratory  to  labora¬ 
tory  basis  or  should  the  results  be  combined  by  method?  Also,  under  what 
conditions  can  the  laboratory  results  from  different  Specimens  be  com¬ 
bined  to  investigate  differences  between  laboratories  and  between  methods? 

There  are  speolfic  difficulties,  but  we  believe  a  general  discussion 
of  the  attitudes  and  aims  that  one  should  have  when  confronted  with  a 
problem  suoh  as  this  in  which  the  controls  and  data  are  limited  would 
be  appropriate.  What,  for  instance,  should  be  the  major  concern  of  a 
statistical  treatment  which  is  the  first  attempt  to  exercise  statistical 
oontrol  on  the  variables  under  consideration? 
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A.  Bulfinch 
Ficatinny  Arsenal 

Engineers  and  scientists  who  have  recently  been  introduced  to  the 
subject  of  statistical  often  askt  "Just  what  does  one  do  to  design  an 
experiment  in  the  modem  statistical  sense?"  This  is  a  good  question,  and 
there  should  be  a  sensible  answer  that  the  engineer  can  understand  and  use* 
An  examination  of  the  literature  shows  that  mu oh  has  been  written  on  the 
subject,  but  no  unified  procedure  that  can  be  identified  as  such  can  be 
found  in  any  one  document,  loo  'many  books  have  been  written  for  statis¬ 
ticians  and  too  many  handbooks  contain  only  methods  of  analysis* 
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The  engineer  would  like  something  tangible  to  manipulate,  or  a  set  of 
instructions  that  can  be  followed,  something  short  of  book  length.  The 
statistician  may  say  that  this  ie  impossible l  But  his  ccnoluelon  is  based 
on  tho  assumption  that  the  engineer  is  completely  ignorant  of  the  eubjeot 
of  statlstios,  and  that  to  use  statietios  one  must  know  all  of  the  designs 
and  techniques.  Experience  has  shown  that  this  is  not  true.  Many  engineers 
and  scientists  will  design  the  most  efficient  experiment  by  using  Just  good 
common  sense*  Any  one  job  requires  the  uee  of  only  a  few  techniques,  not 
the  whole  spectrum.  From  this  I  have  concluded  that  an  explicitly  described, 
unified  design-of-experiment  procedure  would  be  useful  to  engineers*  Suoh 
a  description  may  inolude  torms  not  familiar  to  the  engineer  or  scientist, 
but  an  effort  to  understand  the  definitions  of  these  terms  would  be  the 
shortest  route  to  a  working  knowledge  of  the  design  of  experiment  in  the 
modem  statistical  sense* 

Planning  an  experiment  along  statistical  lines  forces  one  to  o  onsider 
what  it  is  he  is  seeking  and  what  steps  are  required  to  obtain  it.  This 
often  leads  to  the  reoognition  of  pitfalls  and  fallacies  in  advance  of 
data  collecting. 

The  "design  of  experiment"  is  essentially  the  pattern  of  taking 
observations.  In  its  broader  senee  this  procedure  also "include s  the 
analysis  of  resulte.  The  object  of  designing  an  experiment  in  the  modem 
statistical  sense  is  two  foTd. 

1.  To  obtain  eoonomy  of  experimentation.  That  is,  to  Insure  that 
essential  information  is  obtained  with  minimum  coat  in  time  and  effort. 
"Essential  Information"  ie  defined  as  information  such  that  additional  data 
will  not  ohange  the  ooncl  lalons  drawn,  in  a  practical  sense . 
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2.  To  obtain  a  "yardstick"  with  which  to  evaluate  the  results.  This 
"yardstick"  is  called  the  experimental  error,  which  is  obtained  by  re¬ 
plicating  the  results. 


The  "design  of  experiment"  may  be  regarded  as  an  aspect  of  the  ecienti« 
fio  method.  The  intrinsic  characteristics  of  the  scientific  method  are  the 
examination  of  what  is  known  and  the  formulation  of  theories  or  hypotheses 
which  may  be  verified  by  experimentation.  The  concept  of  experimentation 
is  the  crux  of  the  entire  matter,  for  any  question  whose  answer  may  not 
bo  obtained  by  planned  observations  is  not  in  the  realm  of  science* 
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The  actual  formulation  of  hypotheses  and  theories  is  a  matter  of 
intuition,  native  ability,  and  insight.  Verification  of  these  hypotheses 
and  theories  cannot  be  absolute,  for  ve  can  only  show  that  the  obeervations 
are  compatible  with  the  hypothesis  within  the  limits  of  experimental  error. 
This  is  the  major  reason  for  the  use  of  the  "null"  hypothesis  in  statistics* 
We  make  changes  and  assume  or  theorize  that  these  changes  have  made  no  dif¬ 
ference,  that  the  difference  is  "null 11  or  amounts  to  nothing*  In  every  case 
we  state  our  questions  to  be  answered. by  the  experiment  in  a  hypothesis  to 
be  disproven  by  the  data.  If  we  fail  to  disprove  the  hypothesis,  then  we 
accept  it  as  true  or  reserve  decision.  This  means  we  have  three  alterna¬ 
tives:  reject  the  hypothesis,  accept  the  hypothesis,  or  reserve  decision. 

In  the  analysis  of  variance  (of  designed  experiments)  we  combine  the  last 
two  alternatives  and  state:  "There  is  nob  sufficient  data  to  detect  a 
difference". 

The  hypothesis  that  there  is  no  difference  (the  null  hypothesis)  Is 
unrealistic,. since  different  treatments  must  have  produced  some  difference. 
The  real  problem  is  to  obtain  estimates  of  the  magnitude  of "the  difference 
and  determine  whether  this  has  any  practical  or  economic  importance. 

The  acoeptanoe  of  any  hypothesis  on  the  basis  of  data  obtained  from 
samples  of  a  population  or  universe  is  subject  to  a  probability  of  error. 
This  principle  represents  the  basis  of  modern  statistical  theory.  In  test¬ 
ing  a  hypothesis  there  are  two  possible  errors:  Type  I  Error  is  the  risk 
of  rejecting  the  hypothesis  when  it  is  true*.  Type  II  Error  is  the  risk  of 
aeeepting  the  hypothesis  when  it  is  false.  The  value  of  designed  experi¬ 
ments  is  that  they  minimize  these  risks  of  error  with  minimum  effort.  That 
is,  statistically  designed  experiments  are  the  most  efficient  experiments 
since  they  can  obtain  essential  information  with  minimum  cost. 

A  hypothesis  must  provide  the  answer  for  a  practical  problem,  provide 
an  explanation  of  known  facts,  and  give  prediotibns  that  can  be  verified. 

It  is  essential  that  hypotheses  and  their  outcomes  be  formulated  before 
verification  is  attempted.  Valid  probability  statements  cannot  be  made 
about  statistical  tests  suggested  by  the  data  to  which  they  apply. 

The  theory  of  statistics,  which  is  entirely  deductive,  provides  a 
basis. for  inductive  processes.  No  inductive  Inference  is  certain  to  be 
correct,  so  every  conclusion  drawn  from  finite  experimental  data  is  subject 
to  error.  With  the  aid  of  mathematical  statistics,  probability  state¬ 
ments  may  be  made  about  these  errors. 

The  role  of  statistics  in  the  scientific  method  has  three  functions: 

1.  Description  -  This  is  the  reduction  of  a  mass  of  data  to  ,<uch 
quantities  as  the  mean  and  the  varianoe.  If  the  data  is  all  of  the  relevant 
information  about  the  whole  population,  these  quantities  are  called  para¬ 
meters  and  the  description  is  deductive.  If  the  data  is  only  a  sample  of 
the  whole  population,  these  quantities  are  called  Eta titties  and  the 
decription  is  inductive. 

2.  Analysis  -  This  means,  given  observed  values  from  a  sample,  to 
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estimate  the  population  parameters.  Also  analysis  can  mean  given  observed 
values  from  two  sample s,  to  determine  whether  the  two  samples  came  from 
the  same  population. 

3.  Prediction  -  This  means  rational  inductive  processes.  This  is 
the  major  objective  of  the  application  of  the  scientific  method  to  natural 
phenomena.  The  practical  appli cation  of  the  theory  of  probability  through 
the  use  of  statistical  techniques  has  made  it  possible  to  make  predictions 
from  controlled  experiments  with  mathematical  precision. 

Emphasis  should  be  placed  on  the  application  of  the  theory  of  probabil¬ 
ity  sinoe  at  the  theory  level  academic  sterility  ie  an  ever  present  danger. 
As  Bross  puts  it,  "Academitis  is  a  disease  characterized  by  hair¬ 
splitting  and  eventually,  rigor  mortis." 

For  our  purposes  it  is  useful  to  distinguish  between  two  types  of 
experiments. 

1.  The  determination  of  the  numerical  magnitude  of  a  particular 
characteristic  for  a  specified  population. 

2.  The  determination  of  the  effect  of  two  or  more  treatments  on  a 
particular  population  characteristic. 

In  the  first  type  the  populations  consist  of  existing  items  or  proper¬ 
ties,  and  it  is  simply  a  matter  of  measuring  them.  In  the  seoond  type  the 
populations  studied  are  created  by  the  experimenter  in  the  act  of  taking 
measurements.  It  is  in  this  latter  type  of  experiment  that  statistical 
design  techniques  are  required. 

Planning  the  experiment  in  advance  of  data  collecting  cannot  be 
overemphasized.  In  the  past,  an  experiment  was  considered  a  venture  into 
the  unknown,  and  as  such,  any  approach  and  any  result  was  acceptable, 
since  neither  could  be  predicted  or  evaluated.  This  was  a  boon  to  the 
experimenter  and  gave  him  a  free  hand.  But  modern  techniques  have  changed 
all  this  by  furnishing  systematic  procedures  for  designing  experiments  and 
analyzing  the  results.  Inefficient  methods  and  unreliable  data  can  no 
longer  be  tolerated. 

Described  below  are  some  of  the  things  that  should  be  done  in  planning 
an  efficient  experiment  and  analyzing  the  results.  This  is  what  I  believe 
engineers  want  when  they  ask,  "How  can  I  design  an  experiment?"  and  what  the 
literature  has  glossed  over* 

a.  Plan  your  experiments  well.  The  conclusions  and  inferences  that 
can  be  drawn  depend  on  the  way  in  which  observations  are  made. 

b.  Use  common  sense.  bon‘t  accept  results  which  contradict  oommon 
sense. 


c.  Use  all  available  knowledge  and  Information  from  past  experience. 
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d.  Consider  all  possible  sources  of  error*  List  the  variables  to 
be  controlled,  those  to  be  varied,  and  the  levels  of  those  to  be  varied. 

e«  Consider  the  entire  scope  of  the  problem.  Without  regard  to  cost, 
time,  or  effort,  consider  what  it  is  you  would  like  to  know  eventually* 

If  this  turns  out  to  be  a  very  large  experiment,  consisting  of  many 
variables,  or  a  very  expensive  experiment  the  cost  of  which  is  prohibitive, 
divide  the  whole  problem  into  rational  parts.  This  makes  possible  a 
systematically-planned  approach.  It  also  makes  it  possible  to  relate 
your  statistical  design  to  cost  and  the  amount  of  information  required* 

f.  Consider  all  possible  outcomes,  and  their  physical, 
tion.  Results  that  have  no  physical  interpretation  have  no 
value. 

g.  Choose  carefully  the  criterion  on  which  conclusions  will  be 
based.  Density  results  are  of  little  value  if  the  use  of  the  material 
depends  upon  the  melting  point. 

h.  Randomize  sample  specimens.  This  can  be  done  by  using  tables 
of  random  numbers  or  by  drawing  numbers  out  of  a  hat*  In  any  case, 
randomization  insures  better  representative  samples  and  guards  against 
biased  results. 

i.  A  valid  estimate  of  e xpe  rlmantal  error  must  be  obtained  with 
which  to  evaluate  the  results.  This  can  usually  be  done  by  taking  repeated 
measurements  under  the  same  controlled  conditions.  This  is  called  "repli¬ 
cation"* 

j.  The  sample  size  (the  number  of  repeated  measurements  under  the 
same  controlled  conditions)  should  be  adjusted  to  control  the  alpha  and 
beta  errors.  The  alpha  error  is  the  risk  of  rejecting  good  material,  the 
Type  I  error,  or  the  producer's  risk*  The  beta  error  is  the  risk  of  ac¬ 
cepting  poor  material,  the  Type  II  error,  or  the  consumer's  risk.  In  order 
to  control  these  errors,  some  knowledge  of  the  variability  (experimental 
error)  must  be  available.  In  addition,  a  decision  must  be  made  concern¬ 
ing  the  magnitude  of  the  difference  that  must  be  detected  to  make  the 
experiment  economically  feasible. 

k.  Carefully  formulate  the  Questions  to  be  answered.  Develop 

the  right  hypotheses  by  asking  the  right  questions  which  the  experimental 
results  are  expected  to  answer*  To  show  conclusively  that' process  A 
gives  a  higher  yield  than  process. B,  is  of  little  value  if  neither  pro¬ 
duces  a  usable  product. 

l.  Of  the  many  experimental  designs  available,  choose  the  one  that 
fits  your  particular  problem  requirements.  Factorial  designs  are  very 
efficient  since  they  will  provide  complete  information  about  all  of  the 
variables,  as  well  as  their  interrelationships,  with  only  a  fraction 

of  the  work  required  by  the  classical  one-at-a-time  procedure*  This  type 
of  design  is  particularly  useful  when  little  is  known  about  the  system 
being  studied,  or  when  it  is  known  that  there  is  a  very  complex  relationship 
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among  the  variables.  If  the  number  of  variables  to  be  studied  exceeds 
£  or  6,  designs  such  as  the  Latin  square  and  fractional  factorials  should 
be  considered  to  affect  further  economies  of  experimentation.  These 
latter  designs  are  also  useful  for  a  sequential  approach  to  a  problem 
containing  more  then  5  or  6  variables  of  interest.  The  analysis  of 
regression,  the  analysis  of  covariance*  and  the  method  of  oonfomding, .are 
useful  when  there  are  variables  that  oannot  be  controlled.  The  correlation 
coefficient  and  the  analysis  of  regression  are  useful  in  studying  the 
relation  between  variables  —  such  as  cause  and  effect. 

m.  A  property  of  these  designs,  known  as  Orthogonality,  should  be 
controlled  in  order  to  simplify  the  calculations  and the  interpretation 
of  the  results*  This  property  insures  that  all  the  variables  (called 
main  effects)  and  all  of  their  interrelationships  (called  interactions) 
can  be  independently  estimated  without  entanglement. 

n.  Care  should  be  taken  so  that  the  effect  of  one  variable  is 

not  confounded  or  confused  with  that  of  another  when  independent  measure¬ 
ments  of  each  are  required.  Little  can  be  concluded  about  the  moisture 
content  of  two  products,  made  by  different  processes,  if  ambient  humidity 
conditions  are  permitted  to  effect  the  results.  In  suoh  a  case,  the 
moisture  content  due  to  the  process  is  confounded  (or  confused)  with  that 
due  to  the  humidity.  If  the  ambient  humidity  condition  is  an  important 
variable  in  the  system,  it  should  be  controlled  and  the  experiment 
designed  to  determine  its  effect.  If  it  cannot  be  controlled,  the  experi¬ 
ment  should  be  designed  so  thiit  changes  in  humidity  can  affect  only 
unimportant  parts  of  the  experiment,  such  as  the  higher  order  interactions* 

o*  The  concept  of  interaction  should  be  understood.  Interaction 
is  said  to  be  present  when  certain  particular  combinations  cf  conditions 
produce  unusual  results.  This  is  the  nonadditive  or  unpredictable 
portion  of  the  experiment,  and,  as  such,  is  the  only  patentable  portion 
of  the  experiment.  There  can  be  interaction  between  iwo  or  more  factors 
(variables).  Interactions  involving  three  or  more  factors  are  referred 
to  as  the  higher  order  Interactions.  Interactions  involving  five  or 
more  factors  seldom  have  any  physical  interpretation  or  practical 
importance. 

p.  The  observations  or  measurements  must  be  independent  for  many 
designs.  Measurements  are  said  to  be  independent  if  the  probability  that 
one  of  them  will  have  a  certain  value  is  the  same,  no  matter  what  values 
are  obtained  for  other  measurements.  This  means  that  the  results  cannot  be 
correlated  and  that  the  taking  of  a  measurement  will  not  affect  the 
outcome  of  succeeding  measurements.  For  example,  if  the  first  measurement 
raises  the  temperature  of  the  system,  and  the  results  are  affected  by 
temperature  changes,  then  the  probability  of  reproducing  the  first  result 
with  a  second  measurement  is  nil.  In  such  a  case,  the  temperature  must 
be  controlled  in  order  to  obtain  independent  measurements.  However,  if 
the  variables  are  correlated,  the  analysis  of  regression  or  covariance 
can  be  used. 

q*  Thwre  must  be  assurance  that  the  error  of  measurement  (called 
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the  variance)  does  not  change  from  one  portion  of  the  experiment  to 
another.  That  is,  we  must  comply  with  the  requirement  of  homogeneity 
of  variance*.  This  is  important  because  there  are  two  sources  of 
variation  --  the  means  (or  averages)  and  the  variances.  If  we  observe 
a  difference,  we  want  to  be  in  a  position  to  determine  whether  it  is  due 
to  the  means  or  variances.  We  are  usually  interested  in  changes  of  the  mean 
values,  so  if  the  variances  are  constant  or  homogeneous  and  we  observe  a 
change,  we  will  be  able  to  conclude  that  it  is  due  to  the  means* 


r*  The  concept  of  degrees  of  freedom  should  be  understood,  sinoe  it 
is  used  extensively  in  the  analysis  of  data.  The  number  of  degrees  of 
freedom  ie  equal  to  the  number  of  independent  observations  minus  the 
number  of  parameters  (such  as  the  means)  estimated*  In  computing  the 
variance,  for  example,  only  (n-l)  of  the  deviations  from  the  mean  can  be 
independent*  The  nth  deviation  has  to  be  restricted  in  order  to  make  all 
Hn"  deviations  add  up  to  zero* 


6.  The  type  of  measurement  to  be  used  should  be  considered  for 
the  sake  of  efficiency.  Variable  type  data  is  data  that  can  vary  from 
minus  infinity  to  plus  infinity  on  a  continuous  scale.  This  type  of  data 
furnishes  the  most  information  per  observation. .  Attribute  data  is  quali¬ 
tative  type  data  and  consists  of  discrete  entitles.  Attribute  data  ie 
‘•go”  "no  gor  ‘  “■ 

least  information  per  observation. 


sometimes  called  "go"  "no  go"  data*  The  latter  kind  of  data  gives  the 


t*  The  assumption  of  normality  must  be  considered,  since  most 
probability  statements  are  based  on  this  assumption.  However,  if  you  are 
dealing  with  the  distribution  of  averagee  or  with  Bmall  sample  sizes, 
the  question  of  normality  is  purely  academlo  for  the  following  reasons* 


(l)  The  distribution  of  all  averages  can  be  considered  normal, 
regardless  of  the  source  of  the  individual  values  —  especially  averagee 
of  four  or  more  values. 


(2)  No  reliable  test  of  normality  is  available  for  email 
sample  sizes*  In  addition,  there  are  robust  tests  now  available  which 
are  insensitive  to  deviations  from  normality* 


The  numerical  values  of  measurable  properties  of  products  manufactured 
under  controlled  conditions  can  be  considered  normally  distributed.  The 
P-test  in  the  analysis  of  variance,  and  the  t-test  for  the  difference 
between  two  averages  are  both  insensitive  to  deviations  from  normality* 

With  this  in  mind,  it  can  be  concluded  that  the  assumption  of  normality 
is  sufficiently  valid  for  most  practical  purposes,  unless  there  is  definite 
information  to  the  contrary.  At  worst,  your  level  of  probability  will 
be  low  by  a  few  percent. 


u.  The  saving  of  time  and  effort  through  the  use  of  statistically 
designed  experiments  can  be  demonstrated  by  the  following  comparison 
with  the  classical  one-at-a-time  procedure. 
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In  the  above  illustration  let  a  dash  mark  represent  a  single  determination* 
Classical  Procedure! 

The  effect  of  temperature  is  determined  by  comparing  the  average  of 
duplioate  determinations  at  each  of  the  tvo  temperatures  for  the  first 
pressure  level*  We  repeat  the  process  for  the  second  pressure  level*  To 
determine  the  effect  of  pressure  ve  compare. the  average  of  duplioate  deter¬ 
minations  at  each  of  the  tvo  pressures  for  the  first  temperature  level 
and  repeat  the  procees  flu*  the  seoond  temperature* 

Statistical  Procedure i 

The  effect  of  temperature  le  determined  by  averaging  over  the  tvo 
pressure  levels*  That  is,  the  value  obtained  for  the  condition  of 
"temperature  one"  and  "pressure  one"  is  averaged  with  the  value  obtained 
for  the  condition  of  "temperature  one"  and  "pressure  tvo".  The  prooess  is 
repeated  for  "temperature  tvo".  The  tvo  averages  obtained  in  this  way 
are  compared  to  determine  the. effect  of  temperature*  The  effeot  of  pres¬ 
sure  is  determined  in  a  similar  vay  by  averaging  over  the  tvo  temperature 
levels* 

In  both  cases  ve  vere  comparing  averages  of  duplicate  determinations! 
but  in  the  statistical  procedure  ve  attained  this  precision  vith  only  half 
the  number  of  determinations  used  in  the  olassical  procedure.  This 
economy  is  made  possible  by  removing  tvo  long-standing  barriers!  namely* 

1*  You  can't  average  "apples  and  pears"* 

2*  You  can't  vary  more  than  one  thing  at  a  time* 

The  removal  of  these  barriers  and  using  each  measurement  or  determination 
for  more  than  one  purpose  is  ma thematically  possible  if  ve  assume  that  the 
"error"  oreated  by  changing  the  pressure  in  taking  a  measurement  at 
"temperature  one"  is  equal  to  the  "error"  created  by  changing  the  pressure 
in  taking  a  measurement  at  "temperature  tvo".  If  the  effect  of  these  tvo 
factors  upon  each  other  is  additive,  this  assumption  is  valid.  By 
additive  is  meant  that  if  changing  the  pressure  a  given  amount  produces 
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a  1$%  increase  in  yield  at  "temperature  one",  changing  the  pressure  the 
same  amount  at  "temperature  two"  will  also  produce  a  1 increase  in  yield* 

Algebraically,  iff 

A  -  B, 

(A  +  C)  -  (B  +  0), 

A  -  B  -  0 

(A  +  0)  -  (B  +  C)  «  0 

This  means  that  the  error  due  to  changing  the  pressure  when  measuring  the 
effect  of  temperature  will  cancel  out,  sinoe  measuring  the  effect  of  a 
factor  (or  variable)  is  actually  a  process  of  subtraction  and  an  evaluation 
of  the  difference* 

If  there  are  interaction  (nonadditive)  effects  present,  the  above 
additive  relation  still  holds,  but  additional  work  must  be  done  to  separate 
them  from  experimental  error*  This  can  only  be  done  with  statistical  pro¬ 
cedures.  Interaction  can  never  be  measured  or  calculated  with  the  class¬ 
ical  procedure. 

One  of  the  major  objectives  of  the  statistical  procedure  is  to  obtain 
a  measure  of  experimental  error  (or  reprodueibllity)  with  whloh  to  evaluate 
the  main  faotor  end  interaction  effects  so  that  variation  due  to  chanoe 
alone  can  be  distinguished  from  differences  due  to  assignable  oauses* 

To  get  a  measure  of  experimental  error,  at  least  duplicate  determinations 
must  be  made  for  each  condition.  In  the  above  example  this  would  require 
doubling  the  number  of  determinations  in  the  experiment  under  "Statistical 
Procedure".  This  would  now  mean  that  we  could  compare  averages  of  four 
determinations*  To  make  the  experiment  under  "Classical  Procedure"  com¬ 
parable,  we  would  have  to  double  the  number  of  determinations  here  also  in 
order  to  compare  averages  of  four  determinations. 

Now  a  detailed  comparison  of  the  two  procedures  shews  a  wide  divergence 
in  favor  of  the  "Statistical  Procedure".  By  means  of  this  procedure  the 
total  error  in  the  above  two-factor  experiment  can  be  divided  into  five 
components! 

1*  Main  effeots. 
a.  Temperature, 
b*  Pressure. 

2.  Interaction 
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3,  Experimental  error* 
a*  Replication* 
b»  Residual  error* 

It  is  assumed  that  the  residual  error  ie  that  portion  of  the  totel  error 
vhich  remains  after  all  the  error  due  to  assignable  oauses  has  been 
removed*  That  is,  the  residual  error  ie  assumed  to  be  dU6  to  chance 
oauses  alone*  As  such,  the  residual  error  ie  used  as  a  yardstick  to 
evaluate  the  main  and  interaction  effects  through  the  use  of  the  F-test. 

This  test  is  a  mathematically  precise  method  for  evaluating  data  to 
distinguish  betveen  variations  due  to  chance  alone  and  differences 
due  to  assignable  causes. 

In  contrast,  the  "Clast  , al  Procedure11  includes  no  means  of  determining! 
1.  Host  efficient  and  economic  experimental  designs* 

2*  Interaction  effeotE* 

3*  Residual  error* 

4*  Difference  betveen  variatlone  due  to  chance  alone  and  differences 
due  to  assignable  causes* 

The  result  of  these  defloiences  leaves  only  common  sense  and  subjective 
judgment  (with  all  the  attendant  personal  biases)  to  design  experiments 
and  analyze  data  in  the  "Classical  Procedure". 

To  demonstrate  more  dearly  that  more  than  one  thing  at  a  time  can  be 
varied  in  the  "Statistical  Procedure",  the  following  fractional  factorial 
design  is  presented! 


h  ki 


Measurements  are  made  for  only  those  conditions  indicated  by  the  dashes; 
yet  the  effect  of  all  three  factors  can  be  determined  and  evaluated  if  there 
are  no  significant  interactions  present*  This  ie  only  one-fourth  the 
amount  of  work  required  to  obtain  the  same  precision  by  the  "Classical 
Procedure"*  Truly  a  saving  of  time l 


"  'I# 


'  1  1  ‘  *.  *  »  *  •  U***  m""  V  *  -T*  V"  mH  v  ’*  ■»  m  "  ■«  *mn  "J*  V*  *V  "  V  $ V " HV 

§  v9  •  ♦  _  #  —  "'Ji'  ■  . . 


LINEAR  MODELS  IN  THE  ANALYSIS  OF  VARIANCE* 


M.  B.  Wilk 

Princeton  University 

Introduction.  In  recent  years  a  new  word  has  won  widespread  acceptance 
into  the  technical  language  of  statistics.  I  have  in  mind  the  term  "robust". 
This  expression  was  introduced  by  Box  [  1  /  to  characterize  statistical  tests 
which  are  not  overly  sensitive  in  their  behavior  and  meaning  to  preliminary 
statistical  assumptions.  What  he  meant  us  to  understand  by  this  word  is 
strongly  suggested  by  its  dictionary  definition  (Webster's  2nd  Edition): 

"having  or  evincing  strength  or  vigorous 
health;  strong;  muscular;  vigorous;  sound." 

While  the  use  of  the  word  in  statistics  is  new,  the  basic  conoern  which 
it  reflects  is  not  at  all  recent.  For  example,  the  introduction  by  Fisheir  fo] 
of  the  device  of  deliberate  randomization  in  experimentation  was  motivated  by 
a  desire  to  provide  a  robuBt  basis  for  statistical  inference.  Similarly,  for 
many  years  so-called  nori-parametri o  or  distribution-free  procedures  have  been 
advocated  to  relieve  inferences  from  the  weight  of  assumptions  whose  justifica¬ 
tion  may  be  difficult  or  impossible. 

In  addition  to  our  explicit  concern  with  the  relative  robustness  of 
significance  tests  and  estimation  procedures,  I  would  like  to  direct  some 
attention  to  the  question  of  robustness  of  statistical  experimental  designs 
and  of  statistical  models. 

As  a  simple  example  of  non-robuat  experimental  procedure  consider  the 
situation  suggested  by  (l). 

(1)  7  ■  f(x;  a,  P,  Y,  ...)  ♦  e. 

If  it  is  known  that  the  functional  relation  is  given  by  (2) , 

(2)  y  -  a  +  Bx  ♦  e, 

then  we  know  that,  with  moderately  reasonable  statistical  properties  of  tho 
errors,  a  "best"  selection  of  values  x.  at  which  the  responses  y.  should  be 
observed  would  be  such  sb  to  maximize  *3) ,  which  measures  the  dispersion  of 
the  values. 

(3)  £  (x,  -  x)2  . 

i  x 


*A  talk  given  at  the  Second  Conference  on  the  Design  of  Experiments  in 
Army  Research  Development  and  Testing,  Washington,  D.  C.,  October  19,  1956. 
Prepared  in  connection  with  research  sponsored  by  the  Office  of  Ordnance 
Research. 
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This  effectively  means  that  the  preselected  x^  values  should  he  concentrated 
at  the  two  extremes  of  the  possible  range  of  x.  Clearly  this  design  is  not  at 
all  robust  since  if  there  is,  in  fact,  some  curvature  in  the  relation  between 
y  and  x,  as  for  example  in  (4) 


(4) 


y  «  a  +  0x  +  Y2C  +  «» 


then  we  could  get  no  clue  of  this  from  an  experiment  with  all  x.  values  at 
the  two  ends. 

As  another  example,  consider  the  relationship  of  randomized  complete 
blocks  and  incomplete  blocks  designs.  In  the  latter  designs  the  presence  of 
unanticipated  interactions  cannot,  in  general,  be  easily  detected  and  may  in 
consequence  introduce  serious  errors  into  conclusions.  In  this  sense, 
complete  blocks  are  more  robust  than  incomplete  blocks.  On  the  other  hand, 
the  use  of  complete  blooks  may  lead  to  overly  large  uncontrolled  variation, 
with  consequent  concealment  of  effects  of  interest.  Similarly,  fractional 
factorial  designs  will,  in  general,  be  lees  robust  than  full  factorial  designs 
in  that  the  confounding  which  oocurs  in  the  fractionated  designs  may  be  of 
importance  and  go  undetected. 

In  contrast,  one  of  the  arguments  given  by  Fisher  /u,  p.  106/  in  support 
of  factorial  experiments  is  as  follows: 

"Any  conclusion  has  a  wider  inductive  basis  when  inferred 
from  an  experiment  in  which  the  quantities  of  other  in¬ 
gredients  have  been  varied,  than  it  would  have  from  any 
amount  of  experimentation,  in  which  these  had  been  kept 
strictly  constant." 

The  remainder  of  thie  paper  is  devoted  to  classification  models  and 
regression  models,  with  particular  reference  to  their  robustness  character¬ 
istics.  My  intention  is  to  try  to  deal  with  general  ideas  and  principles 
rather  than  to  attempt  to  convey  any  detailed  methodology. 

Analysis  of  Variance  or  Classification  Models.  I  am  sure  everyone  here 
is  familiar  with  models  of  the  general  appearance  of  (5). 


(5) 


y«^i  +  a  +  b+  c  + 


+  e. 


Such  models  have  been  used  increasingly  widely  in  the  past  decade  ae  a  basis 
for  justifying  the  analysis  of  variance.  It  so  happens  that  if  one  makes 
some  suitably  chosen  assumptions  concerning  this  model,  it  is  possible  to 
provide  an  elegant  and  rather  complete  mathematical-statistical  justification 
for  the  analysis  of  variance.  Unfortunately,  this  justification  does  not 
require  any  deep-rooted  scrutiny  of  the  meaning  or  possible  origin  of  the 
model.  Due  perhaps  to  the  abstract  treatment  of  these  models,  there  have 
occurred  some  conflicting  views  on  appropriate  interpretation  of  fairly  simple 
experimental  situations,  such  a3  the  mixed  model  case  of  a  two-factor  experi¬ 
ment.  The  heart  of  this  controversy  lay  in  the  treatment  of  the  same  experi¬ 
mental  situation  in  terms  of  different,  arbitrary  assumptions  concerning  the 
components  of  the  model. 


i 


•  •  t*  •*>  '  t  •  ••  . -f—  -#  1 


l  »  '  *,  '  ,  1  \  *  *  **_  *.  •  ,  _  Si  Mm  .  -  '  •  -  «  .  J  V  t  \  »  *  <  ,  *  «  'Mi  11  ii  '  •)  *  I  «**  W  «  >  j*  '  1 

V  '■  v  ■■ v- v;'  ^ Y:X' ■■ 

;  Y^i'A/sA’j  hi-.":  “ i 


Design  of  Experiments 


245 


It  seems  apparent  that  if  these  linear  analysis  of  variance  models  are 
to  be  useful  for  a  v/ide  range  of  experimental  circumstances ,  then  they  must 
have  a  robust  status  in  the  sense  that  they  must  derive  their  meaning  and 
properties  not  from  arbitrary  assumptions  but  rather  from  a  very  general 
framework  or  concept  of  experimentation  as  a  means  of  learning  about  the  real 
world,  combined  with  such  direct  properties  as  the  experimental  design  itself 
possesses.  The  model  must  not  depend  on  very  special  properties  of  specific 
experiment al  situations , 

Consider  the  essential  ingredients  of  a  simple  two-factor  experimental 
situation.  In  such  a  situation,  idealized,  one  would  be  concerned  with  de¬ 
termining  the  effects  on  some  response  Y  which  aPg  attributable  to  varia¬ 
tion  in  the  levels  of  each  of  two  factors  ,£$*and  ,  Clearly,  this  descrip¬ 

tion  is  grossly  incomplete,  even  for  an  idealized  framework,  for  no  provision 
has  been  made  for  the  implicit  background  or  surroundings.  To  account  for 
some  of  this  we  introduce  the  notion  of  experimental  units.  For  example,  a 
chemical  engineer  might  wish  to  study  the  effect  of  column  diameter  and  type 
of  packing  on  the  maximum  throughput  in  a  packed  column.  Hero  the  response 
is  to  be  maximum  throughput,  perhaps  in  pounds  per  hour  or  more  likely  in 
pounds  per  square  foot  per  hour;  the  factors  (or  independent  variables)  are 
column  diameter  and  packing;  and  the  experimental  units  will  summarize  such 
features  as  the  method  of  determining  the  maximum  throughput,  the  changes 
which  occur  in  the  fluids  and  equipment  employed,  uncontrolled  ambient 
temperature  and  pressure  changes,  and  so  on.  Clearly  some  properties  of  the 
experimental  units  will  be,  essentially,  constant  for  all  units,  while  other 
characteristics  will  fluctuate  from  one  to  the  other. 

Suppose  factor^to  have  A  levels  and  factor*^ to  have  B  levels,  and 
let  the  indices  1  and  j  have  range  as  given  in  (6) . 

(6)  i  -  1,  2,  A 
J  “1,  2,  « . . ,  B. 

For  initial  simplicity  let  us  assume  that  all  experimental  units  are  identical. 
Then  it  is  reasonable,  in  many  cases,  to  conceive  of  a  number  Y.  ,  defined  in 

(7)  ,  namely  J 


(7)  Y.  .  -  true  or  typical  response  which  would  be  observed 

^  from  the  treatment  combination  consisting  of  the 
ith  level  of  factor^and  the  jth  level  of 
factor^ .  ' 


If  we  now  uoe  dots  to  denote  means  or  averages,  as  exemplified  in  (S) 


then  we  can  write  the  algebraic  identity  given  in  (9). 


I..  *(1^  (T  - !..).(!„  *  I..) 
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It  is  apparnnt  from  their  definition  that  the  components  of  this  popula¬ 
tion  model  can  be  given  a  physical  interpretation  or  meaning.  This  meaning 
is  suggested  by  the  nomenclature  defined  in  (10). 


jx  is  the  overall  mean, 

is  the  main  effect  of  level  i  of  factor ^7^, 

(10)  A 

bj  is  the  main  effect  of  level  j  of  factor^ , 

(ab)^  ie  the  interaction  of  level  i  of  factor^Vith  level, 
j  of  factor 

Two  important  aspects  of  these  defined  components  of  the  population 
model  should  be  made  explicit.  First,  the  definition  of,  for  example,  the 
main  effects  of  factor,^*  depends  crucial^/  on  which  levels  of  factors  are 
included  in  the  experimental  situation.  Seoond,  the  relative  and  absolute 
magnitudes  of  the  interactions  will  depend  on  the  scale  of  measurement  of  the 
responses  Y,  Thus  the  same  two  factors  may  show  important  interaction  on  one 
scale  of  response  Y,  and  yet  may  show  negligible  interaction  on  some  other 
soale  of  response}  for  example,  g(Y)  -/Y.  For  the  very  special  and  important 
case  in  which  interaotions  are  negligible  then  the  meaning  of  the  main  effects 
of  factor^ become  independent,  in  general,  of  the  levels  of  factor*^? involved. 
This  is  formally  stated  in  (llj,  which  follows  directly  from  the  definition  of 
Ub)ir 
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0  Implies  Y^j  -  Y  j  ■  Y^  -  Y»,  ■  a^, 


It  is,  however,  worth  repeating  that-,  the  relative  size  and  importance  of  the 
two-factor  interaotions  depends  not  only  on  the  mechanics  of  the  situation 
but  also  on  the  scale  in  which  the  responses  are  analyzed. 

The  same  notions  may  be  extended  to  the  more  realistic  case  where 
experimental  units  are  different;  that  is,  where  unperceived  or  uncontrolled 
variation  in  the  background  may  condition  or  obscure  our  evaluation  of  the 
effects  of  the  factors.  The  population  model  then  takes  the  form  given  in  (12), 


a  *  ai  *  V  (ab)u  *  *  pijx  • 


In  this  expression,  e,  may  be  called  the  additive  unit  error  and  p.., 
the  interactive  unit  error.  The  population  model  components  are  now 
defined  with  respect  to  the  relevant  population  of  experimental  units  and  of 
treatment  combinations.  The  e,  reflect  variation  among  experimental  units, 
averaged  over  all  treatments,  K  The  p. reflect  interactions  of  treatment 
combinations  with  experimental  units.  ^ 

Now  as  yet  we  have  said  nothing  about  an  actual  experiment;  we  have 
simply  developed  a  formal  framework  which  we  hope  is  sufficiently  flexible 
to  fit  most  two-factor  experimental  situations  reasonably  well. 
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Suppose  a  factorial  experiment  is  now  carried  out,  as  sketchily  outlined 
in  (13). 


(13) 


(i)  Select  a  levels  of  factor,^}  a  $  A, 

(ii)  Select  b  levels  of  factor^^1;  b  <!  B, 

(iii)  Have  r  replications  of  the  selected  a  x  b 
treatment  combinations. 


At  this  poino  it  is  necessary  to  inquire  just  how  selection  of  levels 
and  allocation  of  experimental  units  is  to  be  made.  To  the  extent  that  physical 
randomization  (i.e.,  random  numbers)  is  employed,  objective  statistical-prob¬ 
ability  ideas  can  be  used  to  make  inferences  from  the  actual  experimental 
observations  to  certain  fairly  well  defined  broader  populations.  To  the  extent 
that  randbmization  is  net  employed,  broader  inferences  can  not  be  based  solely 
on  statistical-probability  notions. 

If  we  have  conformed  "to  all  the  principles  of  allowed  witch-craft"  — 
to  use  a  phrase  due  to  W,  3,  Qosset,  better  known  as  'Student'  —  we  oan 
carry  our  population  models  forward  to  a  statistical  model  for  the  observa¬ 
tions,  Use  the  notation  defined  in  (14)* 


Let 


u 


1,  2,  a 


V  *  1,  2 ,  , ,  1 ,  b 

denote  selected  levels  of  faotors^^and  j8  ,  in  order  of  their  random 
selection; 


(14) 


uv 


If  a  a  a  |  r 


denote  replication  of  treatment  (u,  v); 

xuvf  rePresen^  observation  from  replication 
f  -  f  of  treatment  (u,  v). 


uv 


Then  we  can  write  a  statistical  model  for  the  observations  x  »  in  the  form 
given  in  (15) . 


(15) 


\vf 


u  +  a  *  p,,  ♦  (op)  +  e 


uv  ■'uvf  ’ 


This  model  derives  from  the  population  model  by  imposing  the  conditions 
of  the  experimental  design,  including  the  randomization  employed  as  well  as 
the  pattern.  An  outline  of  the  relationship  is  given  in  (16)  using  the 
simplifying  assumption  that  all  are  negligible. 
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Define  the  following  design  random  variables; 


a?  ■  1  if  selection  u  corresponds  to  i  in  the  population 
1  of  levels  o 

■  0  otherwise. 

pj  -  1  ^  < - »  J. 

■  0  otherwise. 

guvf  “  1  if  the  ffch  replicate  of  selected  treatment  (uv)  falls 
°k  on  experimental  unit  k, 

-  0  otherwise. 

The  properties  of  these  random  variables  derive  from  the  pattern  of 
random  selection  and  allocation  (i.e, ,  the  experimental  design)  employed. 

We  then  have,  with  the  simplifying  assumption  that  p^  -  0, 

i  Bv-r5eIbi  ' 


^uv  ■  “i  euvf  ■  E  5.  *lt 


The  important  point  is  that  the  properties  of  the  components  of  the  stat¬ 
istical  model  for  the  observations  follows  from  combination  of  the  population 
model  (which  was  based  on  the  rather  general  concept  of  a  true  response)  with 
the  experimental  design  which  is  actually  imposed  by  the  experimenter. 

The  implications  of  this  model  so  for  as  interpretation  of  the  analysis 
of  variance  is  concerned  is  partially  indicated  by  the  expectations  of  mean 
squares  given  in  Table  1. 


Due  to 


(a-l)(b-l)  I* 

a 

r  + 

e 

Residual 

ab(r-l)  R* 

0 

r2 

e 

2  1  2 

Definitions :  oa  -  |  a^  } 

i 

a  i 
a*  i 
M 

1 

■  1 

ml  ! 

T~ 

M. 

-j  2  1 

h  (“b) 

2 

°ab  (A-lj(B-l) 

m  •  ♦  ♦  •  -it  • 

r-ry  ♦•y*— ■  y — y  V1  c .  \  ^  »  f*  h  ^  *•  ^ 


b  +  rb  aa 


b  +  ra  °b 


P-1  5  ek 
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It  fihXi  be  seen  from  Table  1  that  if  B  =  b,  that  is,  if  all  levels  of 
factor^  in  the  population  considered  are  studied  in  the  experiment,  then 
the  compcment  of  variation  due  to  interactions,  u  ,  2,  does  not  contribute 
to  the  mean  square.  Contrariwise,  if  B»  b^  s8Dthat  only  a  small  propor¬ 
tion  of  possible  levels  of  factor*/?  are  sampled  and  one  wishes  to  make 
inferences  relative  to  the  entire  population  of  levels  of  factored,  then 
the  interaction  component  of  variation  does  contribute,  on  the  average,  to 
the ,7^  mean  square. 

The  fact  that  the  results  of  Table  1  derive  from  the  quite  robust  model 
we  have  developed  is  one  strong  indication  that  the  analysis  of  variance  iB 
a  meaningful  procedure,  without  regard  to  more  sophisticated  assumptions . 

The  results  given  in  Table  1  involved  the  simplifying  assumption  that 
the  interactive  unit  errors ,  the  p . which  measure  unit-treatment  inter¬ 
actions,  were  negligible,  Moreover,  ™he  model  used  contained  no  provision 
for  either  measurement  errors  or  variabilities  in  preparation  of  treatments. 
The  results  on  expectations  of  mean  squares  under  a  more  general  model, 
which  do  provide  for  such  effects,  are  given  in  Table  2,  with  a  notation 
that  lends  itself  readily  to  extension  to  more  complex  situations. 


Due  to 


& 
A  A 

Residual 


Table  2 


E.X.S. 


E  +  rE  .  +  rbE 
o  ab  a 

£o  *  **  * 

Eo  *  *.b 


Definitions : 


_  212121  2 

a  a  B  ab  P  ae  BP  abe 

c^  —2  1  2  1  _  2  ,  1  2 

b  b  "A  ab  *P  b«  AF  abe 

E  ,  -  o  2  -  4  o  2 
ab  ab  P  abe 

r>  _  2  l  2  1  2  1  2 

°e  ~  A  °ae  ~  B  °be  AB  °abe 

E  «  o  2  -  5  E  . 
ae  ae  B  abe 


^abe  "  °abe 


_ 1. 

[A-1HB-3 


Jk^piJk“pi.k“p.jk^ 


E  ■  o  +  E  .  +  E  +  E.  +  E 

o  abe  ae  be  e 

2 

o  ■  Variance  of  "technical  errors" 

P  ■  size  of  population  of  experimental  units. 
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*  The  symbol  »  is  used  to  denote  "much  larger  than." 
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j  A  close  inspection  of  the  results  of  Table  2  will  show  that  the  existence 

[.  of  unit-treatment  interactions  occasions  a  bias  in  the  analysis  of  variance 
.  in  the  sense  that  unbiased  estimates  of  error  cannot  in  general  be  obtained. 

If  the  size  of  the  relevant  population  of  experimental  units  is  large,  how¬ 
ever,  this  bias  is  negligible. 

Thus  we  see  that,  with  appropriate  interpretation,  classification  models 
can  be  given  a  robust  status.  Such  models  can  be  used  whenever  factor  levels 
are  distinguishable  either  qualitatively  or  quantitatively.  They  help  in 
several  ways:  (l)  They  provide  a  formal  structure  'whose  relation  to  the 
populations  of  interest  is  usually  well-defined.  This  helps  in  interpreting 
the  actual  experimental  results  in  terms  of  the  broader  populations  of  concern. 

(2)  Properly  used  and  interpreted,  these  models  help  provide  insight  into  the 
physical  meaning  of  terms  such  as  "effects"  and  "interactions  of  factors." 

(3)  The  use  of  the  models  brings  out  into  the  open  the  necessary  assumptions 
(or  conditions)  which  may  be  necessary  for  an  unambiguous  interpretation  of 
the  analysis  of  variance.  In  the  same  way  they  help  in  evaluating  the 
possible  direction  of  misinterpretation  if  assumptions  fail.  (4)  By  appropriate 
statistical  analysis  —  as,  for  example,  by  finding  a  scale  for  analysis  on 
which  interactions  are  negligible  —  we  may  be  led  to  simplified  and  hence 

;  more  developed  models. 

I  The  main  deficiency  of  these  general  classification  models  —  and  it  is 

overwhelmingly  important  —  is  that  the  classification  models  do  not  directly 
concern  themselves  with  functional  relations  between  response  and  factors 
S  or  independent  variables.  If  quantitative  information  on  factors  is  avail- 

Iable,  the  use  of  a  classification  model  will  simply  ignore  this  information  — 
obviously  an  undesirable  feature.  Thus,  as  ordinarily  employed,  classifica¬ 
tion  models  when  properly  interpreted  do  not  require  sophisticated  information 
to  be  useful,  but  by  the  same  token  they  do  not  lead  to  sophisticated  insights. 

Further  published  work  on  classification  models  can  be  found  in  refer- 
\  ences  fij,  %  fh) ,  . 

I  Polynomial  Regression  Models.  Another  type  of  model  is  widely  used  in 

s  statistical  analysis  of  experimental  data  is  the  polynomial  regression  model, 

!  such  as  in  (17). 

;  (17)  7  •  a0o  +  “10*1  *  “02*2  *  “ll2!2  *  a22s22  *“12*1*2  *  «rror' 


It  is  easily  seen  that  suoh  models,  as  well  as  the  classification  models, 
can  be  put  in  the  form  of  a  linear  multiple  regression  model  suoh  as  (18) . 

(13)  y  -  0qXo  +  PjXj.  +  02x2  +  •••  +  3^  +  •• 

The  appropriate  correspondence  for  regression  models  is  indicated  in  ( 19) • 

°0  "  P0  *•  1  "  x0  • 
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To  show  the  formal  connection,  for  classification  models,  consider  for 
example  a  2x2  factorial  experiment.  We  could  write  a  simplified 
classification  model  as  in  (20). 

*ii  *  V1  *  a2*°  ♦  br1  *  V°  *  eu  • 

*12  ■  ♦  V1  *  a2-°  *  bl-°  *  b2l1  *  *12  • 

(20) 

*21  -i*-1  *  al'°  *  ‘2-1  *  V1  ♦  b2'°  *  *21  - 

*22  '  J*-1  *  al'°  +  V1  *  bl'°  *  V1  *  "22  ' 

Clearly  this  has  the  same  formal  structure  as  the  multiple  regression  model, 
with  the  x's  taking  on  the  values  0  and  1,  appropriately,  and  the 
parameters  of  the  classification  model  playing  the  role  of  regression  co¬ 
efficients  . 

While  this  formal  identification  is  sometimes  convenient  in  allowing 
a  certain  unity  and  elegance  in  mathematical  developments  concerning  least 
squares  and  analysis  of  variance  theory,  there  are  important  logioal  and 
practical  distinctions  between  classification  and  regression  models. 

In  a  regression  model  suoh  as  (17) ,  the  values  of  s.  and  z„  are 
quantitative  identifications  or  descriptions  of  the  levels  of  two^f actors 
under  study,  and  it  is  ordinarily  lmplioit  that  the  values  of  the  z's 
are  sufficient  to  summarize  the  important  characteristics  of  the  actual 
factor  levels  used  in  the  experiment,  in  the  sense,  for  example,  that  we 
ordinarily  believe  the  application  of  a  particular  pressure  to  be  summarized 
by  the  number  of  pounds  par  square  inch  associated  with  the  applied  pressure. 

While  it  is  basic  in  a  regression  model  that  the  factors  be  quantified, 
their  quantification  known,  and  that  the  numerical  measure  be  a  complete 
summary,  the  classification  modal  does  not  require  this  information.  On 
the  other  hand,  even  for  comparatively  simple  experimental  situations  the 
number  of  parameters  in  a  classification  model  can  rapidly  become  very  large 
indeed.  For  example,  if  in  a  5  x  5  x  5  factorial  experiment  we  could  Ignore 
three-factor  interactions  but  no  others,  we  would  need  61  independent 
parameters  in  a  classification  model.  A  moderately  complex  regression  model 
might  employ  20  parameters.  Clearly  the  classification  model  assumes 
less,  but  also  accomplishes  less. 

It  has  been  said  of  the  popularity  of  the  assumption  of  normal  or 
Gaussian  distribution  that  "everybody  believes  in  the  law  of  errors,  the 
experimenters  because  they  think  it  is  a  mathematical  theorem,  the  mathe¬ 
maticians  because  they  think  it  is  an  experimental  fact."  One  wonders 
whether  a  similar  remark  might  not  be  appropriate  to  the  popularity  of 
polynomial  regression. 

The  basic  mathematical  theorems  are  due  to  Taylor  and  to  Weierstrass. 
Taylor's  Theorem  tells  us  that  if  a  function  f(x)  has  derivatives  of 
order  k,  then  f(x)  may  be  expanded  as  a  power  series  of  the  form  shown 


.1# 
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in  expression  (2l).  In  this  expression  Xq  is  some  preselected  value  of 
x,  and  R  is  the  remainder  after  n  terms  of  the  expansion.  The  co¬ 
efficients  f'(xfJi  f"(x0),  and  so  on,  are  the  first,  second,  etc, 

.1 _ J  _ _ i.  J _ _  .XI  _ \  * _ _ X.  »  J  _ _  II1L...  X  L  ...  AM.  A  ama!  awl  a 


efficients  f'toJ, 
derivatives  of  f(x) 
independent  of  x,  o 


evaluated  at 


ie  selected. 


Thus  they  are  constants, 


f(x)  -  f(x  )  +  (x  -  x  )f'(x J  + 


(x-x.)n  f(n) 

n!  1 


(x-x  )' 


f"(x) 

0 


(x  )  ♦  R  (x,  x  ). 
o  n  o 


The  Weierstrass  Theorem  states  that  every  function  which  is  continuous 
on  a  olosod  interval  can  be  approximated  on  that  interval  as  closely  as  we 
please  by  a  polynomial  of  sufficiently  high  degree. 

The  practical  hope  derived  from  these  theorems  is  that  even  low  degree 
polynomials,  say  quadratics  and  cubics,  may  give  good  approximations  if  the 
interval  involved  is  not  too  large  and  the  function  is  fairly  smooth. 

Two  basic  practical  facts  are:  first,  polynomial  models  have  been 
used  with  much  success  by  experimenters,  with  and  without  statistics; 
second,  the  estimation  of  unknown  parameters  in  polynomial  regression  models 
by  least  squares  leads  to  equations  which  are  linear  in  the  unknowns,  and 
hence  can  be  solved  by  more-or-less  routine  arithmetical  operations. 

An  additional  robust  feature  of  regression  models  is  that  if  inadequate 
they  are  to  some  extent  self-revealing.  Thus,  it  is  well  known  from  least 
squares  theory  that,  with  moderately  reasonable  behavior  of  "errors",  the 
residual  sum  of  squares  after  fitting  a  given  regress ion  model  will,  when 
divided  by  a  suitable  factor,  often  called  the  residual  "degrees  of  freedom", 
be  an  estimate  of  the  residual  variation  —  if  the  model  fitted  was  appropri¬ 
ate.  Thus  if  through  replication  or  other  information  we  have  independent 
knowledge  of  the  magnitude  of  the  error  variance,  then  a  check  can  be  made 
on  the  model  used.  This  procedure  is,  in  fact,  properly  regarded  as  an 
analysis  of  variance  technique  and  is  an  important  part  of  the  use  of  re¬ 
gression  models.  Thus  in  the  absence  of  knowledge  of  functional  relations 
among  quantitative  variables,  polynomial  regression  models  constitute 
moderately  robust  vehicles  for  organizing,  analyzing,  and  summarizing 
experimental  data.  There  are,  however,  a  number  of  possible  snags  which 
must  be  kept  in  mind. 

Item  1:  However  good  the  fit  of  a  regression  model  over  the  range  of 
variables  for  which  data  are  available,  extrapolation  beyond 
the  observed  range  ie  fraught  with  hazard,  unless  theory  or 
other  experiments  give  clear  indication  of  the  functional  form 
in  the  region  of  extrapolation. 

Item  2:  Despite  the  self-checking  of  the  regression  model,  even  interp¬ 
olation  must  be  done  carefully  in  that  representation  of  the 
model  may  be  systematically  bad  over  some  regions.  This  aspect 
can  be  studied  and  guarded  against  to  some  extent  by  the 
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computation  and  plotting  of  residuals.  Happily,  this  practice 
is  being  recommended  increasingly  these  days. 


Item  3: 


(22) 


Item  7s 


m- 

CkVJr*  *>  . 


The  statistical  methods  for  fitting  regression  models  have  good 
properties  when  the  independent  variables  are  free  of  important 
random  errors.  In  many  praotical  oases  the  independent  variables 
are  not  free  of  errors.  Just  how  misleading  this  can  be  is  a 
topic  which  still  needs  much  investigation. 


Item  4: 


An  open  question  always  exists  as  to  the  degree  of  the  polynomial 
whioh  should  be  fitted.  This  problem  becomes  especially  important 
when  no  reliable  independent  estimate  of  errors  exists.  There  are 
real  dangers  in  overfitting  or  underfitting  and  thereby  assessing 
the  importance  of  various  faotors  improperly. 


Item  5s  When  two  or  more  variables  are  involved,  it  will  often  be  sensible, 
in  prineiple,  to  examine  several  regression  models  simultaneously, 
as  for  example  those  given  in  (22). 


7  *  a00  *  <*10*1  4  <*01*2  *  a20*l  +  <*02*2  • 


*  "  a00  4  <*10*1  *  a01X2  4  <*11*1*2  4  a02*2  * 


7  "  a00  +  a10*l  4  a01*2  4  a20*l  4  all*l*2  • 


y  -  a00  4  a01*2  4  <*11*1*2  4  ^1*1  • 


etc. 


The  computing  labor  involved  will  usually  present  a  formidable 
barrier,  though  automatic  high-speed  machines  should  eventually 
overcome  this. 


Item  6: 


The  use  of  a  standard  shotgun  technique  such  as  fitting  poly¬ 
nomial  models  can  discourage  careful  thinking  about  specific 
situations  by  providing  an  easy  but  mediocre  substitute.  There 
is  a  long  run  danger  of  replacing  insight  by  formalized  numerioal 
computations. 


The  use  of  regression  models  is  usually  predicated  on  the  assump¬ 
tions  that  the  factor  levels  involved  are  completely  identified 
by  the  numbers  associated  with  them.  This  may  not  be  valid.  For 
example,  if  the  deformation  behavior  of  a  substance  is  being 
studied  at  say  5  levels  of  pressure  the  relevant  features  of  the 
levels  may  be  not  only  the  final  pressure  but  also  the  rate  of 
pressure  increase,  the  mechanism  of  pressure  application,  tempera¬ 
ture  increases  due  to  the  pressure,  and  so  on.  The  factor  levels 
are  then  quite  definitely  distinguishable  but  not  so  precisely 
identifiable  by  a  single  number.  In  such  cases  the  results  of 
analysis  by  a  classification  model  could  differ  importantly  from 
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those  from  a  regression  model.  The  analysis  based  on  the  classi¬ 
fication  model  would  be  less  specific,  but  usually  more  robust, 
than  the  regression  model  analysis. 

Item  82  Regression  models  are  usually  frankly  empirical.  They  are  not, 
in  general,  based  on  broad  theories  which  may  be  useful  in  wider 
circumstances.  Conversely,  the  unthinking  use  of  regression  models 
does  little  to  encourage  the  construction  of  broad  scientific 
theories. 

The  listing  of  these  items  is  not  intended  tb  disparage  regression 
models  nor  to  discourage  their  use}  rather,  it  is  hoped  that,  as  in  the 
case  of  classification  models,  the  tool  may  be  employed  more  efficiently 
if  its  weaknesses  are  reoognised. 

RfltUm.Wyiyi  Mil  RflUrWf  iPR  In  *  Bide  by 

side  disousslon  of  both  classification  and  regression  models  there  are 
implicit  two  challenges.  One  is  the  question  of  the  relationships,  if  any, 
between  these  two  types  of  models.  We  have,  after  all,  claimed  considerable 
generality  for  both  types.  Thus,  despite  their  different  justifications  and 
interpretations  they  must  relate  in  some  systematic  way. 

The  second  challenge  is,  of  course,  what  to  do  about  oombined  qualita¬ 
tive  and  quantitative  factors.  Suppose,  for  example ,  we  have  an  experiment 
with  one  qualitative  and  one  quantitative  factor.  One  simple  answer  is  use 
a  distinct  regression  model  for  the  quantitative  factor  for  every  level  of 
the  qualitative  factor.  This  may  not  be  a  bad  procedure  and  sometimes  will 
have  much  to  reoommend  it,  but  in  general  seems  an  Inadequate  substitute. 

The  remainder  of  the  paper  is  devoted  to  a  brief  and  rather  superficial 
consideration  of  these  two  related  questions. 

Let  us  fix  our  attention  on  two  f  actora1>^and^having  A  and  B  levels 
respectively.  We  know  that  we  can,  under  quite  general  conditions,  write  a 
population  classification  model  as  in  expression  (9)  and  develop  it  into  a 
statistical  model  for  the  observations  as  nketchily  indicated  in  expressions 
(15)  and  (l6).  If  the  levels  of  factors ^ and  J3  are  quantitatively  identi¬ 
fied  by  the  variables  u  and  v.  then  usually  we  can  also  write  a  poly¬ 
nomial  regression  model  such  as  (23) . 

hi  ■  aoo  4  Vi 4  Vi2  4  Vi3  4  — 


Vj 4  V/  4  Vj3  4  — 

W)  4  w/  4  T2l“l  7 3 4  ••• 
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Eor  a  given  range  of  levels  in  the  populations  of  levels  of  factors « 
and^Q  ,  we  can  now  inquire  what  are  the  relations  between  the  components 
of  the  population  classification  and  regression  models?  Straightforward 
algebra  leads  us  to  the  results  given  in  expression  (24). 
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involved.  The  difference  between  two^main  effects  does  not,  however, 
depend  on  what  other  levels  of  factor^w  are  involved  in  the  experimental 
situation. 

The  interactions  (ab)^  will  evidently  be  negligible  if  and  only  if 

the  coefficients  Y^*  Y^*  Y2i»  e^c*  are  n®gHslkI®*  If  this  is  so, 

then  we  see  that  the  definitions  of  the  a.  becomes  independent  of  v,  v2,  v^, 
eto.  In  other  words,  if  the  interaction  coefficients ,Ahe  y's>  are 
negligible,  the  meaning  of  the  main  effects  of  factor,)^ become  independent 
of  which  levels  of  factor,^ are  involved  in  the  experimental  situation. 

It  is  tempting  to  think  that  if  the  main  effects  of  factor^ are  small, 
then  the  regression  oosffloients  a.,  a-,  eto.,  will  be  small*  The  relations 
of  expression  (24)  show  that  this  is  not  at  all  necessarily  so. 

There  is  some  suggestion  on  how  to  handle  the  combined  qualitative- 
quantitative  case  in  expression  (24).  Suppose,  for  simplicity,  that  a 
reasonable  polynomial  model,  would  be  as  given  in  (26). 


(26) 


[U  ■  ao  *  Vi  *  Vi2  +  Vi  *  Vi2  *  Wj 


If  fch.  level,  of  factoii^ar.  not  quantitatively  identified,  then  the  u, 
values  are  unknown.  If  we  superimpose  on  (26)  the  appropriate  classifica¬ 
tion  population  model,  we  obtain  (27). 


(27) 


V  ■  v  ai  *  Wi  - ;)  *  Vr/  -  y2>  - 
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Definition: 


(01  *  YllV  • 


In  this  model  the  unknown  parameters  are  as  listed  in  (28). 

»• 

(28)  Ji,  ^1  j  ,  ,  P2  i  f  ai  "  0  • 

This  crossed  population  model  can  be  carried  forward  into  a  statistical 
model  for  the  observations.  The  structure  of  the  least  squares  estimates 
of  the  parameters  is  given  in  (29). 

The  usefulness  of  such  models  will  have  to  be  learned  by  field  trial, 
as  well  ae  from  further  theoretical  study. 

A 
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Definitions:  (v^  -  v)^  , 

Sxv-?tx.3-I‘")  (V?)  1 

V  -  5  (V..v)v* , 

Wl'v^  ■ 

W  5  (x.3  -  x")t3  • 
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